HCoop Git - jackhill/mal.git/blame_incremental

... / ...

Commit	Line	Data
	1	# The Make-A-Lisp Process
	2
	3	So you want to write a Lisp interpreter? Welcome!
	4
	5	The goal of the Make-A-Lisp project is to make it easy to write your
	6	own Lisp interpreter without sacrificing those many "Aha!" moments
	7	that come from ascending the McCarthy mountain. When you reach the peak
	8	of this particular mountain, you will have an interpreter for the mal
	9	Lisp language that is powerful enough to be self-hosting, meaning it
	10	will be able to run a mal interpreter written in mal itself.
	11
	12	So jump right in (er ... start the climb)!
	13
	14	- [Pick a language](#pick-a-language)
	15	- [Getting started](#getting-started)
	16	- [General hints](#general-hints)
	17	- [The Make-A-Lisp Process](#the-make-a-lisp-process-1)
	18	- [Step 0: The REPL](#step-0-the-repl)
	19	- [Step 1: Read and Print](#step-1-read-and-print)
	20	- [Step 2: Eval](#step-2-eval)
	21	- [Step 3: Environments](#step-3-environments)
	22	- [Step 4: If Fn Do](#step-4-if-fn-do)
	23	- [Step 5: Tail call optimization](#step-5-tail-call-optimization)
	24	- [Step 6: Files, Mutation, and Evil](#step-6-files-mutation-and-evil)
	25	- [Step 7: Quoting](#step-7-quoting)
	26	- [Step 8: Macros](#step-8-macros)
	27	- [Step 9: Try](#step-9-try)
	28	- [Step A: Metadata, Self-hosting and Interop](#step-a-metadata-self-hosting-and-interop)
	29
	30
	31	## Pick a language
	32
	33	You might already have a language in mind that you want to use.
	34	Technically speaking, mal can be implemented in any sufficiently
	35	complete programming language (i.e. Turing complete), however, there are a few
	36	language features that can make the task MUCH easier. Here are some of
	37	them in rough order of importance:
	38
	39	* A sequential compound data structure (e.g. arrays, lists,
	40	vectors, etc)
	41	* An associative compound data structure (e.g. a dictionary,
	42	hash-map, associative array, etc)
	43	* Function references (first class functions, function pointers,
	44	etc)
	45	* Real exception handling (try/catch, raise, throw, etc)
	46	* Variable argument functions (variadic, var args, splats, apply, etc)
	47	* Function closures
	48	* PCRE regular expressions
	49
	50	In addition, the following will make your task especially easy:
	51
	52	* Dynamic typing / boxed types (specifically, the ability to store
	53	different data types in the sequential and associative structures
	54	and the language keeps track of the type for you)
	55	* Compound data types support arbitrary runtime "hidden" data
	56	(metadata, metatables, dynamic fields attributes)
	57
	58	Here are some examples of languages that have all of the above
	59	features: JavaScript, Ruby, Python, Lua, R, Clojure.
	60
	61	Michael Fogus has some great blog posts on interesting but less well
	62	known languages and many of the languages on his lists do not yet have
	63	any mal implementations:
	64	* http://blog.fogus.me/2011/08/14/perlis-languages/
	65	* http://blog.fogus.me/2011/10/18/programming-language-development-the-past-5-years/
	66
	67	Many of the most popular languages already have Mal implementations.
	68	However, this should not discourage you from creating your own
	69	implementation in a language that already has one. However, if you go
	70	this route, I suggest you avoid referring to the existing
	71	implementations (i.e. "cheating") to maximize your learning experience
	72	instead of just borrowing mine. On the other hand, if your goal is to
	73	add new implementations to mal as efficiently as possible, then you
	74	SHOULD find the most similar target language implementation and refer
	75	to it frequently.
	76
	77	If you want a fairly long list of programming languages with an
	78	approximate measure of popularity, try the [Programming Language
	79	Popularity Chart](http://langpop.corger.nl/)
	80
	81
	82	## Getting started
	83
	84	* Install your chosen language interpreter/compiler, language package
	85	manager and build tools (if applicable)
	86
	87	* Fork the mal repository on github and then clone your forked
	88	repository:
	89	```
	90	git clone git@github.com:YOUR_NAME/mal.git
	91	cd mal
	92	```
	93
	94	* Make a new directory for your implementation. For example, if your
	95	language is called "quux":
	96	```
	97	mkdir quux
	98	```
	99
	100	* Modify the top level Makefile to allow the tests to be run against
	101	your implementation. For example, if your language is named "quux"
	102	and uses "qx" as the file extension, then make the following
	103	3 modifications to Makefile:
	104	```
	105	IMPLS = ... quux ...
	106	...
	107	quux_STEP_TO_PROG = mylang/$($(1)).qx
	108	...
	109	quux_RUNSTEP = ../$(2) $(3)
	110	```
	111
	112	This allows you to run tests against your implementation like this:
	113	```
	114	make "test^quux^stepX"
	115	```
	116
	117	TODO: If your implementation language is a compiled language, then you
	118	should also add a Makefile at the top level of your implementation
	119	directory.
	120
	121	Your Makefile will define how to build the files pointed to by the
	122	quux_STEP_TO_PROG macro. The top-level Makefile will attempt to build
	123	those targets before running tests. If it is a scripting
	124	language/uncompiled, then no Makefile is necessary because
	125	quux_STEP_TO_PROG will point to a source file that already exists and
	126	does not need to be compiled/built.
	127
	128
	129	## General hints
	130
	131	Stackoverflow and Google are your best friends. Modern polyglot
	132	developers do not memorize dozens of programming languages. Instead,
	133	they learn the peculiar terminology used with each language and then
	134	use this to search for their answers.
	135
	136	Here are some other resources where multiple languages are
	137	compared/described:
	138	* http://learnxinyminutes.com/
	139	* http://hyperpolyglot.org/
	140	* http://rosettacode.org/
	141	* http://rigaux.org/language-study/syntax-across-languages/
	142
	143	Do not let yourself be bogged down by specific problems. While the
	144	make-a-lisp process is structured as a series of steps, the reality is
	145	that building a lisp interpreter is more like a branching tree. If you
	146	get stuck on tail call optimization, or hash-maps, move on to other
	147	things. You will often have a stroke of inspiration for a problem as
	148	you work through other functionality. I have tried to structure this
	149	guide and the tests to make clear which things can be deferred until
	150	later.
	151
	152	An aside on deferrable/optional bits: when you run the tests for
	153	a given step, the last tests are often marked with an "optional"
	154	header. This indicates that these are tests for functionality that is
	155	not critical to finish a basic mal implementation. Many of the steps
	156	in this process guide have a "Deferrable" section, however, it is not
	157	quite the same meaning. Those sections include the functionality that
	158	is marked as optional in the tests, but they also include
	159	functionality that becomes mandatory at a later step. In other words,
	160	this is a "make your own Lisp adventure".
	161
	162	Use test driven development. Each step of the make-a-lisp process has
	163	a bunch of tests associated with it and there is an easy script to run
	164	all the tests for a specific step in the process. Pick a failing test,
	165	fix it, repeat until all the tests for that step pass.
	166
	167	## Reference Code
	168
	169	The `process` directory contains abbreviated pseudocode and
	170	architecture images for each step of the make-a-lisp process. Use
	171	a textual diff/comparison tool to compare the previous pseudocode step
	172	with the one you are working on. The architecture images have changes
	173	from the previous step highlighted in red.
	174
	175	If you get completely stuck and are feeling like giving up, then you
	176	should "cheat" by referring to the same step or functionality in
	177	a existing implementation language. You are here to learn, not to take
	178	a test, so do not feel bad about it. Okay, you should feel a little
	179	bit bad about it.
	180
	181
	182	## The Make-A-Lisp Process
	183
	184	In the steps that follow the name of the target language is "quux" and
	185	the file extension for that language is "qx".
	186
	187
	188	<a name="step0"></a>
	189
	190	### Step 0: The REPL
	191
	192	![step0_repl architecture](step0_repl.png)
	193
	194	This step is basically just creating a skeleton of your interpreter.
	195
	196	* Create a `step0_repl.qx` file in `quux/`.
	197
	198	* Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep`
	199	(read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just
	200	stubs that return their first parameter (a string if your target
	201	language is a statically typed) and `rep` calls them in order
	202	passing the return to the input of the next.
	203
	204	* Add a main loop that repeatedly prints a prompt (needs to be
	205	"user> " for later tests to pass), gets a line of input from the
	206	user, calls `rep` with that line of input, and then prints out the
	207	result from `rep`. It should also exit when you send it an EOF
	208	(often Ctrl-D).
	209
	210	* If you are using a compiled (ahead-of-time rather than just-in-time)
	211	language, then create a Makefile (or appropriate project definition
	212	file) in your directory.
	213
	214	It is time to run your first tests. This will check that your program
	215	does input and output in a way that can be captured by the test
	216	harness. Go to the top level and run the following:
	217	```
	218	make "test^quux^step0"
	219	```
	220
	221	Add and then commit your new `step0_repl.qx` and `Makefile` to git.
	222
	223	Congratulations! You have just completed the first step of the
	224	make-a-lisp process.
	225
	226
	227	#### Optional:
	228
	229	* Add full line editing and command history support to your
	230	interpreter REPL. Many languages have a library/module that provide
	231	line editing support. Another option if your language supports it is
	232	to use an FFI (foreign function interface) to load and call directly
	233	into GNU readline, editline, or linenoise library. Add line
	234	editing interface code to `readline.qx`
	235
	236
	237	<a name="step1"></a>
	238
	239	### Step 1: Read and Print
	240
	241	![step1_read_print architecture](step1_read_print.png)
	242
	243	In this step, your interpreter will "read" the string from the user
	244	and parse it into an internal tree data structure (an abstract syntax
	245	tree) and then take that data structure and "print" it back to
	246	a string.
	247
	248	In non-lisp languages, this step (called "lexing and parsing") can be
	249	one of the most complicated parts of the compiler/interpreter. In
	250	Lisp, the data structure that you want in memory is basically
	251	represented directly in the code that the programmer writes
	252	(homoiconicity).
	253
	254	For example, if the string is "(+ 2 (* 3 4))" then the read function
	255	will process this into a tree structure that looks like this:
	256	```
	257	List
	258	/ \| \
	259	/ \| \
	260	/ \| \
	261	Sym:+ Int:2 List
	262	/ \| \
	263	/ \| \
	264	/ \| \
	265	Sym:* Int:3 Int:4
	266	```
	267
	268	Each left paren and its matching right paren (lisp "sexpr") becomes
	269	a node in the tree and everything else becomes a leaf in the tree.
	270
	271	If you can find code for an implementation of a JSON encoder/decoder
	272	in your target language then you can probably just borrow and modify
	273	that and be 75% of the way done with this step.
	274
	275	The rest of this section is going to assume that you are not starting
	276	from an existing JSON encoder/decoder, but that you do have access to
	277	a Perl compatible regular expressions (PCRE) module/library. You can
	278	certainly implement the reader using simple string operations, but it
	279	is more involved. The `make`, `ps` (postscript) and Haskell
	280	implementations have examples of a reader/parser without using regular
	281	expression support.
	282
	283	* Copy `step0_repl.qx` to `step1_read_print.qx`.
	284
	285	* Add a `reader.qx` file to hold functions related to the reader.
	286
	287	* If the target language has objects types (OOP), then the next step
	288	is to create a simple stateful Reader object in `reader.qx`. This
	289	object will store the tokens and a position. The Reader object will
	290	have two methods: `next` and `peek`. `next` returns the token at
	291	the current position and increments the position. `peek` just
	292	returns the token at the current position.
	293
	294	* Add a function `read_str` in `reader.qx`. This function
	295	will call `tokenizer` and then create a new Reader object instance
	296	with the tokens. Then it will call `read_form` with the Reader
	297	instance.
	298
	299	* Add a function `tokenizer` in `reader.qx`. This function will take
	300	a single string and return an array/list
	301	of all the tokens (strings) in it. The following regular expression
	302	(PCRE) will match all mal tokens.
	303	```
	304	[\s,](~@\|[\[\]{}()'`~^@]\|"(?:\\.\|[^\\"])"\|;.\|[^\s\[\]{}('"`,;)])
	305	```
	306	* For each match captured within the parenthesis starting at char 6 of the
	307	regular expression a new token will be created.
	308
	309	* `[\s,]*`: Matches any number of whitespaces or commas. This is not captured
	310	so it will be ignored and not tokenized.
	311
	312	* `~@`: Captures the special two-characters `~@` (tokenized).
	313
	314	* ```[\[\]{}()'`~^@]```: Captures any special single character, one of
	315	```[]{}'`~^@``` (tokenized).
	316
	317	* `"(?:\\.\|[^\\"])*"`: Starts capturing at a double-quote and stops at the
	318	next double-quote unless it was proceeded by a backslash in which case it
	319	includes it until the next double-quote (tokenized).
	320
	321	* `;.*`: Captures any sequence of characters starting with `;` (tokenized).
	322
	323	* ```[^\s\[\]{}('"`,;)]*```: Captures a sequence of zero or more non special
	324	characters (e.g. symbols, numbers, "true", "false", and "nil") and is sort
	325	of the inverse of the one above that captures special characters (tokenized).
	326
	327	* Add the function `read_form` to `reader.qx`. This function
	328	will peek at the first token in the Reader object and switch on the
	329	first character of that token. If the character is a left paren then
	330	`read_list` is called with the Reader object. Otherwise, `read_atom`
	331	is called with the Reader Object. The return value from `read_form`
	332	is a mal data type. If your target language is statically typed then
	333	you will need some way for `read_form` to return a variant or
	334	subclass type. For example, if your language is object oriented,
	335	then you can define a top level MalType (in `types.qx`) that all
	336	your mal data types inherit from. The MalList type (which also
	337	inherits from MalType) will contains a list/array of other MalTypes.
	338	If your language is dynamically typed then you can likely just
	339	return a plain list/array of other mal types.
	340
	341	* Add the function `read_list` to `reader.qx`. This function will
	342	repeatedly call `read_form` with the Reader object until it
	343	encounters a ')' token (if it reach EOF before reading a ')' then
	344	that is an error). It accumulates the results into a List type. If
	345	your language does not have a sequential data type that can hold mal
	346	type values you may need to implement one (in `types.qx`). Note
	347	that `read_list` repeatedly calls `read_form` rather than
	348	`read_atom`. This mutually recursive definition between `read_list`
	349	and `read_form` is what allows lists to contain lists.
	350
	351	* Add the function `read_atom` to `reader.qx`. This function will
	352	look at the contents of the token and return the appropriate scalar
	353	(simple/single) data type value. Initially, you can just implement
	354	numbers (integers) and symbols . This will allow you to proceed
	355	through the next couple of steps before you will need to implement
	356	the other fundamental mal types: nil, true, false, and string. The
	357	remaining mal types: keyword, vector, hash-map, and atom do not
	358	need to be implemented until step 9 (but can be implemented at any
	359	point between this step and that). BTW, symbols types are just an
	360	object that contains a single string name value (some languages have
	361	symbol types already).
	362
	363	* Add a file `printer.qx`. This file will contain a single function
	364	`pr_str` which does the opposite of `read_str`: take a mal data
	365	structure and return a string representation of it. But `pr_str` is
	366	much simpler and is basically just a switch statement on the type of
	367	the input object:
	368
	369	* symbol: return the string name of the symbol
	370	* number: return the number as a string
	371	* list: iterate through each element of the list calling `pr_str` on
	372	it, then join the results with a space separator, and surround the
	373	final result with parens
	374
	375	* Change the `READ` function in `step1_read_print.qx` to call
	376	`reader.read_str` and the `PRINT` function to call `printer.pr_str`.
	377	`EVAL` continues to simply return its input but the type is now
	378	a mal data type.
	379
	380	You now have enough hooked up to begin testing your code. You can
	381	manually try some simple inputs:
	382	* `123` -> `123`
	383	* ` 123 ` -> `123`
	384	* `abc` -> `abc`
	385	* ` abc ` -> `abc`
	386	* `(123 456)` -> `(123 456)`
	387	* `( 123 456 789 ) ` -> `(123 456 789)`
	388	* `( + 2 (* 3 4) ) ` -> `(+ 2 (* 3 4))`
	389
	390	To verify that your code is doing more than just eliminating extra
	391	spaces (and not failing), you can instrument your `reader.qx` functions.
	392
	393	Once you have gotten past those simple manual tests, it is time to run
	394	the full suite of step 1 tests. Go to the top level and run the
	395	following:
	396	```
	397	make "test^quux^step1"
	398	```
	399
	400	Fix any test failures related to symbols, numbers and lists.
	401
	402	Depending on the functionality of your target language, it is likely
	403	that you have now just completed one of the most difficult steps. It
	404	is down hill from here. The remaining steps will probably be easier
	405	and each step will give progressively more bang for the buck.
	406
	407	#### Deferrable:
	408
	409
	410	* Add error checking to your reader functions to make sure parens
	411	are properly matched. Catch and print these errors in your main
	412	loop. If your language does not have try/catch style bubble up
	413	exception handling, then you will need to add explicit error
	414	handling to your code to catch and pass on errors without crashing.
	415
	416	* Add support for the other basic data type to your reader and printer
	417	functions: string, nil, true, and false. These become mandatory at
	418	step 4. When a string is read, the following transformations are
	419	applied: a backslash followed by a doublequote is translated into
	420	a plain doublequote character, a backslash followed by "n" is
	421	translated into a newline, and a backslash followed by another
	422	backslash is translated into a single backslash. To properly print
	423	a string (for step 4 string functions), the `pr_str` function needs
	424	another parameter called `print_readably`. When `print_readably` is
	425	true, doublequotes, newlines, and backslashes are translated into
	426	their printed representations (the reverse of the reader). The
	427	`PRINT` function in the main program should call `pr_str` with
	428	print_readably set to true.
	429
	430	* Add support for the other mal types: keyword, vector, hash-map.
	431	* keyword: a keyword is a token that begins with a colon. A keyword
	432	can just be stored as a string with special unicode prefix like
	433	0x29E (or char 0xff/127 if the target language does not have good
	434	unicode support) and the printer translates strings with that
	435	prefix back to the keyword representation. This makes it easy to
	436	use keywords as hash map keys in most languages. You can also
	437	store keywords as a unique data type, but you will need to make
	438	sure they can be used as hash map keys (which may involve doing
	439	a similar prefixed translation anyways).
	440	* vector: a vector can be implemented with same underlying
	441	type as a list as long as there is some mechanism to keep track of
	442	the difference. You can use the same reader function for both
	443	lists and vectors by adding parameters for the starting and ending
	444	tokens.
	445	* hash-map: a hash-map is an associative data structure that maps
	446	strings to other mal values. If you implement keywords as prefixed
	447	strings, then you only need a native associative data structure
	448	which supports string keys. Clojure allows any value to be a hash
	449	map key, but the base functionality in mal is to support strings
	450	and keyword keys. Because of the representation of hash-maps as
	451	an alternating sequence of keys and values, you can probably use
	452	the same reader function for hash-maps as lists and vectors with
	453	parameters to indicate the starting and ending tokens. The odd
	454	tokens are then used for keys with the corresponding even tokens
	455	as the values.
	456
	457	* Add support for reader macros which are forms that are
	458	transformed into other forms during the read phase. Refer to
	459	`tests/step1_read_print.mal` for the form that these macros should
	460	take (they are just simple transformations of the token stream).
	461
	462	* Add comment support to your reader. The tokenizer should ignore
	463	tokens that start with ";". Your `read_str` function will need to
	464	properly handle when the tokenizer returns no values. The simplest
	465	way to do this is to return `nil` mal value. A cleaner option (that
	466	does not print `nil` at the prompt is to throw a special exception
	467	that causes the main loop to simply continue at the beginning of the
	468	loop without calling `rep`.
	469
	470
	471	<a name="step2"></a>
	472
	473	### Step 2: Eval
	474
	475	![step2_eval architecture](step2_eval.png)
	476
	477	In step 1 your mal interpreter was basically just a way to validate
	478	input and eliminate extraneous white space. In this step you will turn
	479	your interpreter into a simple number calculator by adding
	480	functionality to the evaluator (`EVAL`).
	481
	482	Compare the pseudocode for step 1 and step 2 to get a basic idea of
	483	the changes that will be made during this step:
	484	```
	485	diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt
	486	```
	487
	488	* Copy `step1_read_print.qx` to `step2_eval.qx`.
	489
	490	* Define a simple initial REPL environment. This environment is an
	491	associative structure that maps symbols (or symbol names) to
	492	numeric functions. For example, in python this would look something
	493	like this:
	494	```
	495	repl_env = {'+': lambda a,b: a+b,
	496	'-': lambda a,b: a-b,
	497	'': lambda a,b: ab,
	498	'/': lambda a,b: int(a/b)}
	499	```
	500
	501	* Modify the `rep` function to pass the REPL environment as the second
	502	parameter for the `EVAL` call.
	503
	504	* Create a new function `eval_ast` which takes `ast` (mal data type)
	505	and an associative structure (the environment from above).
	506	`eval_ast` switches on the type of `ast` as follows:
	507
	508	* symbol: lookup the symbol in the environment structure and return
	509	the value or raise an error no value is found
	510	* list: return a new list that is the result of calling `EVAL` on
	511	each of the members of the list
	512	* otherwise just return the original `ast` value
	513
	514	* Modify `EVAL` to check if the first parameter `ast` is a list.
	515	* `ast` is not a list: then return the result of calling `eval_ast`
	516	on it.
	517	* `ast` is a empty list: return ast unchanged.
	518	* `ast` is a list: call `eval_ast` to get a new evaluated list. Take
	519	the first item of the evaluated list and call it as function using
	520	the rest of the evaluated list as its arguments.
	521
	522	If your target language does not have full variable length argument
	523	support (e.g. variadic, vararg, splats, apply) then you will need to
	524	pass the full list of arguments as a single parameter and split apart
	525	the individual values inside of every mal function. This is annoying,
	526	but workable.
	527
	528	The process of taking a list and invoking or executing it to return
	529	something new is known in Lisp as the "apply" phase.
	530
	531	Try some simple expressions:
	532
	533	* `(+ 2 3)` -> `5`
	534	* `(+ 2 (* 3 4))` -> `14`
	535
	536	The most likely challenge you will encounter is how to properly call
	537	a function references using an arguments list.
	538
	539	Now go to the top level, run the step 2 tests and fix the errors.
	540	```
	541	make "test^quux^step2"
	542	```
	543
	544	You now have a simple prefix notation calculator!
	545
	546
	547	<a name="step3"></a>
	548
	549	### Step 3: Environments
	550
	551	![step3_env architecture](step3_env.png)
	552
	553	In step 2 you were already introduced to REPL environment (`repl_env`)
	554	where the basic numeric functions were stored and looked up. In this
	555	step you will add the ability to create new environments (`let*`) and
	556	modify existing environments (`def!`).
	557
	558	A Lisp environment is an associative data structure that maps symbols (the
	559	keys) to values. But Lisp environments have an additional important
	560	function: they can refer to another environment (the outer
	561	environment). During environment lookups, if the current environment
	562	does not have the symbol, the lookup continues in the outer
	563	environment, and continues this way until the symbol is either found,
	564	or the outer environment is `nil` (the outermost environment in the
	565	chain).
	566
	567	Compare the pseudocode for step 2 and step 3 to get a basic idea of
	568	the changes that will be made during this step:
	569	```
	570	diff -urp ../process/step2_eval.txt ../process/step3_env.txt
	571	```
	572
	573	* Copy `step2_eval.qx` to `step3_env.qx`.
	574
	575	* Create `env.qx` to hold the environment definition.
	576
	577	* Define an `Env` object that is instantiated with a single `outer`
	578	parameter and starts with an empty associative data structure
	579	property `data`.
	580
	581	* Define three methods for the Env object:
	582	* set: takes a symbol key and a mal value and adds to the `data`
	583	structure
	584	* find: takes a symbol key and if the current environment contains
	585	that key then return the environment. If no key is found and outer
	586	is not `nil` then call find (recurse) on the outer environment.
	587	* get: takes a symbol key and uses the `find` method to locate the
	588	environment with the key, then returns the matching value. If no
	589	key is found up the outer chain, then throws/raises a "not found"
	590	error.
	591
	592	* Update `step3_env.qx` to use the new `Env` type to create the
	593	repl_env (with a `nil` outer value) and use the `set` method to add
	594	the numeric functions.
	595
	596	* Modify `eval_ast` to call the `get` method on the `env` parameter.
	597
	598	* Modify the apply section of `EVAL` to switch on the first element of
	599	the list:
	600	* symbol "def!": call the set method of the current environment
	601	(second parameter of `EVAL` called `env`) using the unevaluated
	602	first parameter (second list element) as the symbol key and the
	603	evaluated second parameter as the value.
	604	* symbol "let\*": create a new environment using the current
	605	environment as the outer value and then use the first parameter as
	606	a list of new bindings in the "let\*" environment. Take the second
	607	element of the binding list, call `EVAL` using the new "let\*"
	608	environment as the evaluation environment, then call `set` on the
	609	"let\*" environment using the first binding list element as the key
	610	and the evaluated second element as the value. This is repeated
	611	for each odd/even pair in the binding list. Note in particular,
	612	the bindings earlier in the list can be referred to by later
	613	bindings. Finally, the second parameter (third element) of the
	614	original `let` form is evaluated using the new "let\" environment
	615	and the result is returned as the result of the `let*` (the new
	616	let environment is discarded upon completion).
	617	* otherwise: call `eval_ast` on the list and apply the first element
	618	to the rest as before.
	619
	620	`def!` and `let*` are Lisp "specials" (or "special atoms") which means
	621	that they are language level features and more specifically that the
	622	rest of the list elements (arguments) may be evaluated differently (or
	623	not at all) unlike the default apply case where all elements of the
	624	list are evaluated before the first element is invoked. Lists which
	625	contain a "special" as the first element are known as "special forms".
	626	The are special because the follow special evaluation rules.
	627
	628	Try some simple environment tests:
	629
	630	* `(def! a 6)` -> `6`
	631	* `a` -> `6`
	632	* `(def! b (+ a 2))` -> `8`
	633	* `(+ a b)` -> `14`
	634	* `(let* (c 2) c)` -> `2`
	635
	636	Now go to the top level, run the step 3 tests and fix the errors.
	637	```
	638	make "test^quux^step3"
	639	```
	640
	641	You mal implementation is still basically just a numeric calculator
	642	with save/restore capability. But you have set the foundation for step
	643	4 where it will begin to feel like a real programming language.
	644
	645
	646	An aside on mutation and typing:
	647
	648	The "!" suffix on symbols is used to indicate that this symbol refers
	649	to a function that mutates something else. In this case, the `def!`
	650	symbol indicates a special form that will mutate the current
	651	environment. Many (maybe even most) of runtime problems that are
	652	encountered in software engineering are a result of mutation. By
	653	clearly marking code where mutation may occur, you can more easily
	654	track down the likely cause of runtime problems when they do occur.
	655
	656	Another cause of runtime errors is type errors, where a value of one
	657	type is unexpectedly treated by the program as a different and
	658	incompatible type. Statically typed languages try to make the
	659	programmer solve all type problems before the program is allowed to
	660	run. Most Lisp variants tend to be dynamically typed (types of values
	661	are checked when they are actually used at runtime).
	662
	663	As an aside-aside: The great debate between static and dynamic typing
	664	can be understood by following the money. Advocates of strict static
	665	typing use words like "correctness" and "safety" and thus get
	666	government and academic funding. Advocates of dynamic typing use words
	667	like "agile" and "time-to-market" and thus get venture capital and
	668	commercial funding.
	669
	670
	671	<a name="step4"></a>
	672
	673	### Step 4: If Fn Do
	674
	675	![step4_if_fn_do architecture](step4_if_fn_do.png)
	676
	677	In step 3 you added environments and the special forms for
	678	manipulating environments. In this step you will add 3 new special
	679	forms (`if`, `fn*` and `do`) and add several more core functions to
	680	the default REPL environment. Our new architecture will look like
	681	this:
	682
	683	The `fn*` special form is how new user-defined functions are created.
	684	In some Lisps, this special form is named "lambda".
	685
	686	Compare the pseudocode for step 3 and step 4 to get a basic idea of
	687	the changes that will be made during this step:
	688	```
	689	diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt
	690	```
	691
	692	* Copy `step3_env.qx` to `step4_if_fn_do.qx`.
	693
	694	* If you have not implemented reader and printer support (and data
	695	types) for `nil`, `true` and `false`, you will need to do so for
	696	this step.
	697
	698	* Update the constructor/initializer for environments to take two new
	699	arguments: `binds` and `exprs`. Bind (`set`) each element (symbol)
	700	of the binds list to the respective element of the `exprs` list.
	701
	702	* Add support to `printer.qx` to print functions values. A string
	703	literal like "#<function>" is sufficient.
	704
	705	* Add the following special forms to `EVAL`:
	706
	707	* `do`: Evaluate all the elements of the list using `eval_ast`
	708	and return the final evaluated element.
	709	* `if`: Evaluate the first parameter (second element). If the result
	710	(condition) is anything other than `nil` or `false`, then evaluate
	711	the second parameter (third element of the list) and return the
	712	result. Otherwise, evaluate the third parameter (fourth element)
	713	and return the result. If condition is false and there is no third
	714	parameter, then just return `nil`.
	715	* `fn*`: Return a new function closure. The body of that closure
	716	does the following:
	717	* Create a new environment using `env` (closed over from outer
	718	scope) as the `outer` parameter, the first parameter (second
	719	list element of `ast` from the outer scope) as the `binds`
	720	parameter, and the parameters to the closure as the `exprs`
	721	parameter.
	722	* Call `EVAL` on the second parameter (third list element of `ast`
	723	from outer scope), using the new environment. Use the result as
	724	the return value of the closure.
	725
	726	If your target language does not support closures, then you will need
	727	to implement `fn*` using some sort of structure or object that stores
	728	the values being closed over: the first and second elements of the
	729	`ast` list (function parameter list and function body) and the current
	730	environment `env`. In this case, your native functions will need to be
	731	wrapped in the same way. You will probably also need a method/function
	732	that invokes your function object/structure for the default case of
	733	the apply section of `EVAL`.
	734
	735	Try out the basic functionality you have implemented:
	736
	737	* `(fn* [a] a)` -> `#<function>`
	738	* `( (fn* [a] a) 7)` -> `7`
	739	* `( (fn* [a] (+ a 1)) 10)` -> `11`
	740	* `( (fn* [a b] (+ a b)) 2 3)` -> `5`
	741
	742	* Add a new file `core.qx` and define an associative data structure
	743	`ns` (namespace) that maps symbols to functions. Move the numeric
	744	function definitions into this structure.
	745
	746	* Modify `step4_if_fn_do.qx` to iterate through the `core.ns`
	747	structure and add (`set`) each symbol/function mapping to the
	748	REPL environment (`repl_env`).
	749
	750	* Add the following functions to `core.ns`:
	751	* `list`: take the parameters and return them as a list.
	752	* `list?`: return true if the first parameter is a list, false
	753	otherwise.
	754	* `empty?`: treat the first parameter as a list and return true if
	755	the list is empty and false if it contains any elements.
	756	* `count`: treat the first parameter as a list and return the number
	757	of elements that it contains.
	758	* `=`: compare the first two parameters and return true if they are
	759	the same type and contain the same value. In the case of equal
	760	length lists, each element of the list should be compared for
	761	equality and if they are the same return true, otherwise false.
	762	* `<`, `<=`, `>`, and `>=`: treat the first two parameters as
	763	numbers and do the corresponding numeric comparison, returning
	764	either true or false.
	765
	766	Now go to the top level, run the step 4 tests. There are a lot of
	767	tests in step 4 but all of the non-optional tests that do not involve
	768	strings should be able to pass now.
	769
	770	```
	771	make "test^quux^step4"
	772	```
	773
	774	Your mal implementation is already beginning to look like a real
	775	language. You have flow control, conditionals, user-defined functions
	776	with lexical scope, side-effects (if you implement the string
	777	functions), etc. However, our little interpreter has not quite reached
	778	Lisp-ness yet. The next several steps will take your implementation
	779	from a neat toy to a full featured language.
	780
	781	#### Deferrable:
	782
	783	* Implement Clojure-style variadic function parameters. Modify the
	784	constructor/initializer for environments, so that if a "&" symbol is
	785	encountered in the `binds` list, the next symbol in the `binds` list
	786	after the "&" is bound to the rest of the `exprs` list that has not
	787	been bound yet.
	788
	789	* Define a `not` function using mal itself. In `step4_if_fn_do.qx`
	790	call the `rep` function with this string:
	791	"(def! not (fn* (a) (if a false true)))".
	792
	793	* Implement the strings functions in `core.qx`. To implement these
	794	functions, you will need to implement the string support in the
	795	reader and printer (deferrable section of step 1). Each of the string
	796	functions takes multiple mal values, prints them (`pr_str`) and
	797	joins them together into a new string.
	798	* `pr-str`: calls `pr_str` on each argument with `print_readably`
	799	set to true, joins the results with " " and returns the new
	800	string.
	801	* `str`: calls `pr_str` on each argument with `print_readably` set
	802	to false, concatenates the results together ("" separator), and
	803	returns the new string.
	804	* `prn`: calls `pr_str` on each argument with `print_readably` set
	805	to true, joins the results with " ", prints the string to the
	806	screen and then returns `nil`.
	807	* `println`: calls `pr_str` on each argument with `print_readably` set
	808	to false, joins the results with " ", prints the string to the
	809	screen and then returns `nil`.
	810
	811
	812	<a name="step5"></a>
	813
	814	### Step 5: Tail call optimization
	815
	816	![step5_tco architecture](step5_tco.png)
	817
	818	In step 4 you added special forms `do`, `if` and `fn*` and you defined
	819	some core functions. In this step you will add a Lisp feature called
	820	tail call optimization (TCO). Also called "tail recursion" or
	821	sometimes just "tail calls".
	822
	823	Several of the special forms that you have defined in `EVAL` end up
	824	calling back into `EVAL`. For those forms that call `EVAL` as the last
	825	thing that they do before returning (tail call) you will just loop back
	826	to the beginning of eval rather than calling it again. The advantage
	827	of this approach is that it avoids adding more frames to the call
	828	stack. This is especially important in Lisp languages because they tend
	829	to prefer using recursion instead of iteration for control structures.
	830	(Though some Lisps, such as Common Lisp, have iteration.) However, with
	831	tail call optimization, recursion can be made as stack efficient as
	832	iteration.
	833
	834	Compare the pseudocode for step 4 and step 5 to get a basic idea of
	835	the changes that will be made during this step:
	836	```
	837	diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt
	838	```
	839
	840	* Copy `step4_if_fn_do.qx` to `step5_tco.qx`.
	841
	842	* Add a loop (e.g. while true) around all code in `EVAL`.
	843
	844	* Modify each of the following form cases to add tail call recursion
	845	support:
	846	* `let*`: remove the final `EVAL` call on the second `ast` argument
	847	(third list element). Set `env` (i.e. the local variable passed in
	848	as second parameter of `EVAL`) to the new let environment. Set
	849	`ast` (i.e. the local variable passed in as first parameter of
	850	`EVAL`) to be the second `ast` argument. Continue at the beginning
	851	of the loop (no return).
	852	* `do`: change the `eval_ast` call to evaluate all the parameters
	853	except for the last (2nd list element up to but not including
	854	last). Set `ast` to the last element of `ast`. Continue
	855	at the beginning of the loop (`env` stays unchanged).
	856	* `if`: the condition continues to be evaluated, however, rather
	857	than evaluating the true or false branch, `ast` is set to the
	858	unevaluated value of the chosen branch. Continue at the beginning
	859	of the loop (`env` is unchanged).
	860
	861	* The return value from the `fn*` special form will now become an
	862	object/structure with attributes that allow the default invoke case
	863	of `EVAL` to do TCO on mal functions. Those attributes are:
	864	* `ast`: the second `ast` argument (third list element) representing
	865	the body of the function.
	866	* `params`: the first `ast` argument (second list element)
	867	representing the parameter names of the function.
	868	* `env`: the current value of the `env` parameter of `EVAL`.
	869	* `fn`: the original function value (i.e. what was return by `fn*`
	870	in step 4). Note that this is deferrable until step 9 when it is
	871	needed for the `map` and `apply` core functions).
	872
	873	* The default "apply"/invoke case of `EVAL` must now be changed to
	874	account for the new object/structure returned by the `fn*` form.
	875	Continue to call `eval_ast` on `ast`. The first element is `f`.
	876	Switch on the type of `f`:
	877	* regular function (not one defined by `fn*`): apply/invoke it as
	878	before (in step 4).
	879	* a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate
	880	a new environment using the `env` and `params` attributes of `f`
	881	as the `outer` and `binds` arguments and rest `ast` arguments
	882	(list elements 2 through the end) as the `exprs` argument. Set
	883	`env` to the new environment. Continue at the beginning of the loop.
	884
	885	Run some manual tests from previous steps to make sure you have not
	886	broken anything by adding TCO.
	887
	888	Now go to the top level, run the step 5 tests.
	889
	890	```
	891	make "test^quux^step5"
	892	```
	893
	894	Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to`
	895	function cannot be tail call optimized because it does something after
	896	the recursive call (`sum-to` calls itself and then does the addition).
	897	Lispers say that the `sum-to` is not in tail position. The `sum2`
	898	function however, calls itself from tail position. In other words, the
	899	recursive call to `sum2` is the last action that `sum2` does. Calling
	900	`sum-to` with a large value will cause a stack overflow exception in
	901	most target languages (some have super-special tricks they use to
	902	avoid stack overflows).
	903
	904	Congratulations, your mal implementation already has a feature (TCO)
	905	that most mainstream languages lack.
	906
	907
	908	<a name="step6"></a>
	909
	910	### Step 6: Files, Mutation, and Evil
	911
	912	![step6_file architecture](step6_file.png)
	913
	914	In step 5 you added tail call optimization. In this step you will add
	915	some string and file operations and give your implementation a touch
	916	of evil ... er, eval. And as long as your language supports function
	917	closures, this step will be quite simple. However, to complete this
	918	step, you must implement string type support, so if you have been
	919	holding off on that you will need to go back and do so.
	920
	921	Compare the pseudocode for step 5 and step 6 to get a basic idea of
	922	the changes that will be made during this step:
	923	```
	924	diff -urp ../process/step5_tco.txt ../process/step6_file.txt
	925	```
	926
	927	* Copy `step5_tco.qx` to `step6_file.qx`.
	928
	929	* Add two new string functions to the core namespaces:
	930	* `read-string`: this function just exposes the `read_str` function
	931	from the reader. If your mal string type is not the same as your
	932	target language (e.g. statically typed language) then your
	933	`read-string` function will need to unbox (extract) the raw string
	934	from the mal string type in order to call `read_str`.
	935	* `slurp`: this function takes a file name (string) and returns the
	936	contents of the file as a string. Once again, if your mal string
	937	type wraps a raw target language string, then you will need to
	938	unmarshall (extract) the string parameter to get the raw file name
	939	string and marshall (wrap) the result back to a mal string type.
	940
	941	* In your main program, add a new symbol "eval" to your REPL
	942	environment. The value of this new entry is a function that takes
	943	a single argument `ast`. The closure calls the your `EVAL` function
	944	using the `ast` as the first argument and the REPL environment
	945	(closed over from outside) as the second argument. The result of
	946	the `EVAL` call is returned. This simple but powerful addition
	947	allows your program to treat mal data as a mal program. For example,
	948	you can now to this:
	949	```
	950	(def! mal-prog (list + 1 2))
	951	(eval mal-prog)
	952	```
	953
	954	* Define a `load-file` function using mal itself. In your main
	955	program call the `rep` function with this string:
	956	"(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \")\")))))".
	957
	958	Try out `load-file`:
	959	* `(load-file "../tests/incA.mal")` -> `9`
	960	* `(inc4 3)` -> `7`
	961
	962	The `load-file` function does the following:
	963	* Call `slurp` to read in a file by name. Surround the contents with
	964	"(do ...)" so that the whole file will be treated as a single
	965	program AST (abstract syntax tree).
	966	* Call `read-string` on the string returned from `slurp`. This uses
	967	the reader to read/convert the file contents into mal data/AST.
	968	* Call `eval` (the one in the REPL environment) on the AST returned
	969	from `read-string` to "run" it.
	970
	971	Besides adding file and eval support, we'll add support for the atom data type
	972	in this step. An atom is the Mal way to represent state; it is
	973	heavily inspired by [Clojure's atoms](http://clojure.org/state). An atom holds
	974	a reference to a single Mal value of any type; it supports reading that Mal value
	975	and modifying the reference to point to another Mal value. Note that this is
	976	the only Mal data type that is mutable (but the Mal values it refers to are
	977	still immutable; immutability is explained in greater detail in step 7).
	978	You'll need to add 5 functions to the core namespace to support atoms:
	979
	980	* `atom`: Takes a Mal value and returns a new atom which points to that Mal value.
	981	* `atom?`: Takes an argument and returns `true` if the argument is an atom.
	982	* `deref`: Takes an atom argument and returns the Mal value referenced by this atom.
	983	* `reset!`: Takes an atom and a Mal value; the atom is modified to refer to
	984	the given Mal value. The Mal value is returned.
	985	* `swap!`: Takes an atom, a function, and zero or more function arguments. The
	986	atom's value is modified to the result of applying the function with the atom's
	987	value as the first argument and the optionally given function arguments as
	988	the rest of the arguments. The new atom's value is returned. (Side note: Mal is
	989	single-threaded, but in concurrent languages like Clojure, `swap!` promises
	990	atomic update: `(swap! myatom (fn* [x] (+ 1 x)))` will always increase the
	991	`myatom` counter by one and will not suffer from missing updates when the
	992	atom is updated from multiple threads.)
	993
	994	Optionally, you can add a reader macro `@` which will serve as a short form for
	995	`deref`, so that `@a` is equivalent to `(deref a)`. In order to do that, modify
	996	the conditional in reader `read_form` function and add a case which deals with
	997	the `@` token: if the token is `@` (at sign) then return a new list that
	998	contains the symbol `deref` and the result of reading the next form
	999	(`read_form`).
	1000
	1001	Now go to the top level, run the step 6 tests. The optional tests will
	1002	need support from the reader for comments, vectors, hash-maps and the `@`
	1003	reader macro:
	1004	```
	1005	make "test^quux^step6"
	1006	```
	1007
	1008	Congratulations, you now have a full-fledged scripting language that
	1009	can run other mal programs. The `slurp` function loads a file as
	1010	a string, the `read-string` function calls the mal reader to turn that
	1011	string into data, and the `eval` function takes data and evaluates it
	1012	as a normal mal program. However, it is important to note that the
	1013	`eval` function is not just for running external programs. Because mal
	1014	programs are regular mal data structures, you can dynamically generate
	1015	or manipulate those data structures before calling `eval` on them.
	1016	This isomorphism (same shape) between data and programs is known as
	1017	"homoiconicity". Lisp languages are homoiconic and this property
	1018	distinguishes them from most other programming languages.
	1019
	1020	You mal implementation is quite powerful already but the set of
	1021	functions that are available (from `core.qx`) is fairly limited. The
	1022	bulk of the functions you will add are described in step 9 and step A,
	1023	but you will begin to flesh them out over the next few steps to
	1024	support quoting (step 7) and macros (step 8).
	1025
	1026
	1027	#### Deferrable:
	1028
	1029	* Add the ability to run another mal program from the command line.
	1030	Prior to the REPL loop, check if your mal implementation is called
	1031	with command line arguments. If so, treat the first argument as
	1032	a filename and use `rep` to call `load-file` on that filename, and
	1033	finally exit/terminate execution.
	1034
	1035	* Add the rest of the command line arguments to your REPL environment
	1036	so that programs that are run with `load-file` have access to their
	1037	calling environment. Add a new "\ARGV\" (symbol) entry to your REPL
	1038	environment. The value of this entry should be the rest of the
	1039	command line arguments as a mal list value.
	1040
	1041
	1042	<a name="step7"></a>
	1043
	1044	### Step 7: Quoting
	1045
	1046	![step7_quote architecture](step7_quote.png)
	1047
	1048	In step 7 you will add the special forms `quote` and `quasiquote` and
	1049	add supporting core functions `cons` and `concat`. The two quote forms
	1050	add a powerful abstraction for manipulating mal code itself
	1051	(meta-programming).
	1052
	1053	The `quote` special form indicates to the evaluator (`EVAL`) that the
	1054	parameter should not be evaluated (yet). At first glance, this might
	1055	not seem particularly useful but an example of what this enables is the
	1056	ability for a mal program to refer to a symbol itself rather than the
	1057	value that it evaluates to. Likewise with lists. For example, consider
	1058	the following:
	1059
	1060	* `(prn abc)`: this will lookup the symbol `abc` in the current
	1061	evaluation environment and print it. This will result in error if
	1062	`abc` is not defined.
	1063	* `(prn (quote abc))`: this will print "abc" (prints the symbol
	1064	itself). This will work regardless of whether `abc` is defined in
	1065	the current environment.
	1066	* `(prn (1 2 3))`: this will result in an error because `1` is not
	1067	a function and cannot be applied to the arguments `(2 3)`.
	1068	* `(prn (quote (1 2 3)))`: this will print "(1 2 3)".
	1069	* `(def! l (quote (1 2 3)))`: list quoting allows us to define lists
	1070	directly in the code (list literal). Another way of doing this is
	1071	with the list function: `(def! l (list 1 2 3))`.
	1072
	1073	The second special quoting form is `quasiquote`. This allows a quoted
	1074	list to have internal elements of the list that are temporarily
	1075	unquoted (normal evaluation). There are two special forms that only
	1076	mean something within a quasiquoted list: `unquote` and
	1077	`splice-unquote`. These are perhaps best explained with some examples:
	1078
	1079	* `(def! lst (quote (2 3)))` -> `(2 3)`
	1080	* `(quasiquote (1 (unquote lst)))` -> `(1 (2 3))`
	1081	* `(quasiquote (1 (splice-unquote lst)))` -> `(1 2 3)`
	1082
	1083	The `unquote` form turns evaluation back on for its argument and the
	1084	result of evaluation is put in place into the quasiquoted list. The
	1085	`splice-unquote` also turns evaluation back on for its argument, but
	1086	the evaluated value must be a list which is then "spliced" into the
	1087	quasiquoted list. The true power of the quasiquote form will be
	1088	manifest when it is used together with macros (in the next step).
	1089
	1090	Compare the pseudocode for step 6 and step 7 to get a basic idea of
	1091	the changes that will be made during this step:
	1092	```
	1093	diff -urp ../process/step6_file.txt ../process/step7_quote.txt
	1094	```
	1095
	1096	* Copy `step6_file.qx` to `step7_quote.qx`.
	1097
	1098	* Before implementing the quoting forms, you will need to implement
	1099	* some supporting functions in the core namespace:
	1100	* `cons`: this function takes a list as its second
	1101	parameter and returns a new list that has the first argument
	1102	prepended to it.
	1103	* `concat`: this functions takes 0 or more lists as
	1104	parameters and returns a new list that is a concatenation of all
	1105	the list parameters.
	1106
	1107	An aside on immutability: note that neither cons or concat mutate
	1108	their original list arguments. Any references to them (i.e. other
	1109	lists that they may be "contained" in) will still refer to the
	1110	original unchanged value. Mal, like Clojure, is a language which uses
	1111	immutable data structures. I encourage you to read about the power and
	1112	importance of immutability as implemented in Clojure (from which
	1113	Mal borrows most of its syntax and feature-set).
	1114
	1115	* Add the `quote` special form. This form just returns its argument
	1116	(the second list element of `ast`).
	1117
	1118	* Add the `quasiquote` special form. First implement a helper function
	1119	`is_pair` that returns true if the parameter is a non-empty list.
	1120	Then define a `quasiquote` function. This is called from `EVAL` with
	1121	the first `ast` argument (second list element) and then `ast` is set
	1122	to the result and execution continues at the top of the loop (TCO).
	1123	The `quasiquote` function takes a parameter `ast` and has the
	1124	following conditional:
	1125	1. if `is_pair` of `ast` is false: return a new list containing:
	1126	a symbol named "quote" and `ast`.
	1127	2. else if the first element of `ast` is a symbol named "unquote":
	1128	return the second element of `ast`.
	1129	3. if `is_pair` of the first element of `ast` is true and the first
	1130	element of first element of `ast` (`ast[0][0]`) is a symbol named
	1131	"splice-unquote": return a new list containing: a symbol named
	1132	"concat", the second element of first element of `ast`
	1133	(`ast[0][1]`), and the result of calling `quasiquote` with the
	1134	second through last element of `ast`.
	1135	4. otherwise: return a new list containing: a symbol named "cons", the
	1136	result of calling `quasiquote` on first element of `ast`
	1137	(`ast[0]`), and the result of calling `quasiquote` with the second
	1138	through last element of `ast`.
	1139
	1140
	1141	Now go to the top level, run the step 7 tests:
	1142	```
	1143	make "test^quux^step7"
	1144	```
	1145
	1146	Quoting is one of the more mundane functions available in mal, but do
	1147	not let that discourage you. Your mal implementation is almost
	1148	complete, and quoting sets the stage for the next very exiting step:
	1149	macros.
	1150
	1151
	1152	#### Deferrable
	1153
	1154	* The full names for the quoting forms are fairly verbose. Most Lisp
	1155	languages have a short-hand syntax and Mal is no exception. These
	1156	short-hand syntaxes are known as reader macros because they allow us
	1157	to manipulate mal code during the reader phase. Macros that run
	1158	during the eval phase are just called "macros" and are described in
	1159	the next section. Expand the conditional with reader `read_form`
	1160	function to add the following four cases:
	1161	* token is "'" (single quote): return a new list that contains the
	1162	symbol "quote" and the result of reading the next form
	1163	(`read_form`).
	1164	* token is "\`" (back-tick): return a new list that contains the
	1165	symbol "quasiquote" and the result of reading the next form
	1166	(`read_form`).
	1167	* token is "~" (tilde): return a new list that contains the
	1168	symbol "unquote" and the result of reading the next form
	1169	(`read_form`).
	1170	* token is "~@" (tilde + at sign): return a new list that contains
	1171	the symbol "splice-unquote" and the result of reading the next
	1172	form (`read_form`).
	1173
	1174	* Add support for quoting of vectors. The `is_pair` function should
	1175	return true if the argument is a non-empty list or vector. `cons`
	1176	should also accept a vector as the second argument. The return value
	1177	is a list regardless. `concat` should support concatenation of
	1178	lists, vectors, or a mix or both. The result is always a list.
	1179
	1180
	1181	<a name="step8"></a>
	1182
	1183	### Step 8: Macros
	1184
	1185	![step8_macros architecture](step8_macros.png)
	1186
	1187	Your mal implementation is now ready for one of the most lispy and
	1188	exciting of all programming concepts: macros. In the previous step,
	1189	quoting enabled some simple manipulation data structures and therefore
	1190	manipulation of mal code (because the `eval` function from step
	1191	6 turns mal data into code). In this step you will be able to mark mal
	1192	functions as macros which can manipulate mal code before it is
	1193	evaluated. In other words, macros are user-defined special forms. Or
	1194	to look at it another way, macros allow mal programs to redefine
	1195	the mal language itself.
	1196
	1197	Compare the pseudocode for step 7 and step 8 to get a basic idea of
	1198	the changes that will be made during this step:
	1199	```
	1200	diff -urp ../process/step7_quote.txt ../process/step8_macros.txt
	1201	```
	1202
	1203	* Copy `step7_quote.qx` to `step8_macros.qx`.
	1204
	1205
	1206	You might think that the infinite power of macros would require some
	1207	sort of complex mechanism, but the implementation is actually fairly
	1208	simple.
	1209
	1210	* Add a new attribute `is_macro` to mal function types. This should
	1211	default to false.
	1212
	1213	* Add a new special form `defmacro!`. This is very similar to the
	1214	`def!` form, but before the evaluated value (mal function) is set in
	1215	the environment, the `is_macro` attribute should be set to true.
	1216
	1217	* Add a `is_macro_call` function: This function takes arguments `ast`
	1218	and `env`. It returns true if `ast` is a list that contains a symbol
	1219	as the first element and that symbol refers to a function in the
	1220	`env` environment and that function has the `is_macro` attribute set
	1221	to true. Otherwise, it returns false.
	1222
	1223	* Add a `macroexpand` function: This function takes arguments `ast`
	1224	and `env`. It calls `is_macro_call` with `ast` and `env` and loops
	1225	while that condition is true. Inside the loop, the first element of
	1226	the `ast` list (a symbol), is looked up in the environment to get
	1227	the macro function. This macro function is then called/applied with
	1228	the rest of the `ast` elements (2nd through the last) as arguments.
	1229	The return value of the macro call becomes the new value of `ast`.
	1230	When the loop completes because `ast` no longer represents a macro
	1231	call, the current value of `ast` is returned.
	1232
	1233	* In the evaluator (`EVAL`) before the special forms switch (apply
	1234	section), perform macro expansion by calling the `macroexpand`
	1235	function with the current value of `ast` and `env`. Set `ast` to the
	1236	result of that call. If the new value of `ast` is no longer a list
	1237	after macro expansion, then return the result of calling `eval_ast`
	1238	on it, otherwise continue with the rest of the apply section
	1239	(special forms switch).
	1240
	1241	* Add a new special form condition for `macroexpand`. Call the
	1242	`macroexpand` function using the first `ast` argument (second list
	1243	element) and `env`. Return the result. This special form allows
	1244	a mal program to do explicit macro expansion without applying the
	1245	result (which can be useful for debugging macro expansion).
	1246
	1247	Now go to the top level, run the step 8 tests:
	1248	```
	1249	make "test^quux^step8"
	1250	```
	1251
	1252	There is a reasonably good chance that the macro tests will not pass
	1253	the first time. Although the implementation of macros is fairly
	1254	simple, debugging runtime bugs with macros can be fairly tricky. If
	1255	you do run into subtle problems that are difficult to solve, let me
	1256	recommend a couple of approaches:
	1257
	1258	* Use the macroexpand special form to eliminate one of the layers of
	1259	indirection (to expand but skip evaluate). This will often reveal
	1260	the source of the issue.
	1261	* Add a debug print statement to the top of your main `eval` function
	1262	(inside the TCO loop) to print the current value of `ast` (hint use
	1263	`pr_str` to get easier to debug output). Pull up the step8
	1264	implementation from another language and uncomment its `eval`
	1265	function (yes, I give you permission to violate the rule this once).
	1266	Run the two side-by-side. The first difference is likely to point to
	1267	the bug.
	1268
	1269	Congratulations! You now have a Lisp interpreter with a super power
	1270	that most non-Lisp languages can only dream of (I have it on good
	1271	authority that languages dream when you are not using them). If you
	1272	are not already familiar with Lisp macros, I suggest the following
	1273	exercise: write a recursive macro that handles postfixed mal code
	1274	(with the function as the last parameter instead of the first). Or
	1275	not. I have not actually done so myself, but I have heard it is an
	1276	interesting exercise.
	1277
	1278	In the next step you will add try/catch style exception handling to
	1279	your implementation in addition to some new core functions. After
	1280	step9 you will be very close to having a fully self-hosting mal
	1281	implementation. Let us continue!
	1282
	1283
	1284	#### Deferrable
	1285
	1286	* Add the following new core functions which are frequently used in
	1287	macro functions:
	1288	* `nth`: this function takes a list (or vector) and a number (index)
	1289	as arguments, returns the element of the list at the given index.
	1290	If the index is out of range, this function raises an exception.
	1291	* `first`: this function takes a list (or vector) as its argument
	1292	and return the first element. If the list (or vector) is empty or
	1293	is `nil` then `nil` is returned.
	1294	* `rest`: this function takes a list (or vector) as its argument and
	1295	returns a new list containing all the elements except the first.
	1296
	1297	* In the main program, use the `rep` function to define two new
	1298	control structures macros. Here are the string arguments for `rep`
	1299	to define these macros:
	1300	* `cond`: "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))"
	1301	* `or`: "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) `(let* (or_FIXME ~(first xs)) (if or_FIXME or_FIXME (or ~@(rest xs))))))))"
	1302
	1303
	1304	<a name="step9"></a>
	1305
	1306	### Step 9: Try
	1307
	1308	![step9_try architecture](step9_try.png)
	1309
	1310	In this step you will implement the final mal special form for
	1311	error/exception handling: `try/catch`. You will also add several core
	1312	functions to your implementation. In particular, you will enhance the
	1313	functional programming pedigree of your implementation by adding the
	1314	`apply` and `map` core functions.
	1315
	1316	Compare the pseudocode for step 8 and step 9 to get a basic idea of
	1317	the changes that will be made during this step:
	1318	```
	1319	diff -urp ../process/step8_macros.txt ../process/step9_try.txt
	1320	```
	1321
	1322	* Copy `step8_macros.qx` to `step9_try.qx`.
	1323
	1324	* Add the `try/catch` special form to the EVAL function. The
	1325	try catch form looks like this: `(try* A (catch* B C))`. The form
	1326	`A` is evaluated, if it throws an exception, then form `C` is
	1327	evaluated with a new environment that binds the symbol `B` to the
	1328	value of the exception that was thrown.
	1329	* If your target language has built-in try/catch style exception
	1330	handling then you are already 90% of the way done. Add a
	1331	(native language) try/catch block that evaluates `A` within
	1332	the try block and catches all exceptions. If an exception is
	1333	caught, then translate it to a mal type/value. For native
	1334	exceptions this is either the message string or a mal hash-map
	1335	that contains the message string and other attributes of the
	1336	exception. When a regular mal type/value is used as an
	1337	exception, you will probably need to store it within a native
	1338	exception type in order to be able to convey/transport it using
	1339	the native try/catch mechanism. Then you will extract the mal
	1340	type/value from the native exception. Create a new mal environment
	1341	that binds `B` to the value of the exception. Finally, evaluate `C`
	1342	using that new environment.
	1343	* If your target language does not have built-in try/catch style
	1344	exception handling then you have some extra work to do. One of the
	1345	most straightforward approaches is to create a a global error
	1346	variable that stores the thrown mal type/value. The complication
	1347	is that there are a bunch of places where you must check to see if
	1348	the global error state is set and return without proceeding. The
	1349	rule of thumb is that this check should happen at the top of your
	1350	EVAL function and also right after any call to EVAL (and after any
	1351	function call that might happen to call EVAL further down the
	1352	chain). Yes, it is ugly, but you were warned in the section on
	1353	picking a language.
	1354
	1355	* Add the `throw` core function.
	1356	* If your language supports try/catch style exception handling, then
	1357	this function takes a mal type/value and throws/raises it as an
	1358	exception. In order to do this, you may need to create a custom
	1359	exception object that wraps a mal value/type.
	1360	* If your language does not support try/catch style exception
	1361	handling, then set the global error state to the mal type/value.
	1362
	1363	* Add the `apply` and `map` core functions. In step 5, if you did not
	1364	add the original function (`fn`) to the structure returned from
	1365	`fn*`, the you will need to do so now.
	1366	* `apply`: takes at least two arguments. The first argument is
	1367	a function and the last argument is list (or vector). The
	1368	arguments between the function and the last argument (if there are
	1369	any) are concatenated with the final argument to create the
	1370	arguments that are used to call the function. The apply
	1371	function allows a function to be called with arguments that are
	1372	contained in a list (or vector). In other words, `(apply F A B [C
	1373	D])` is equivalent to `(F A B C D)`.
	1374	* `map`: takes a function and a list (or vector) and evaluates the
	1375	function against every element of the list (or vector) one at
	1376	a time and returns the results as a list.
	1377
	1378	* Add some type predicates core functions. In Lisp, predicates are
	1379	functions that return true/false (or true value/nil) and typically
	1380	end in "?" or "p".
	1381	* `nil?`: takes a single argument and returns true (mal true value)
	1382	if the argument is nil (mal nil value).
	1383	* `true?`: takes a single argument and returns true (mal true value)
	1384	if the argument is a true value (mal true value).
	1385	* `false?`: takes a single argument and returns true (mal true
	1386	value) if the argument is a false value (mal false value).
	1387	* `symbol?`: takes a single argument and returns true (mal true
	1388	value) if the argument is a symbol (mal symbol value).
	1389
	1390	Now go to the top level, run the step 9 tests:
	1391	```
	1392	make "test^quux^step9"
	1393	```
	1394
	1395	Your mal implementation is now essentially a fully featured Lisp
	1396	interpreter. But if you stop now you will miss one of the most
	1397	satisfying and enlightening aspects of creating a mal implementation:
	1398	self-hosting.
	1399
	1400	#### Deferrable
	1401
	1402	* Add the following new core functions:
	1403	* `symbol`: takes a string and returns a new symbol with the string
	1404	as its name.
	1405	* `keyword`: takes a string and returns a keyword with the same name
	1406	(usually just be prepending the special keyword
	1407	unicode symbol). This function should also detect if the argument
	1408	is already a keyword and just return it.
	1409	* `keyword?`: takes a single argument and returns true (mal true
	1410	value) if the argument is a keyword, otherwise returns false (mal
	1411	false value).
	1412	* `vector`: takes a variable number of arguments and returns
	1413	a vector containing those arguments.
	1414	* `vector?`: takes a single argument and returns true (mal true
	1415	value) if the argument is a vector, otherwise returns false (mal
	1416	false value).
	1417	* `hash-map`: takes a variable but even number of arguments and
	1418	returns a new mal hash-map value with keys from the odd arguments
	1419	and values from the even arguments respectively. This is basically
	1420	the functional form of the `{}` reader literal syntax.
	1421	* `map?`: takes a single argument and returns true (mal true
	1422	value) if the argument is a hash-map, otherwise returns false (mal
	1423	false value).
	1424	* `assoc`: takes a hash-map as the first argument and the remaining
	1425	arguments are odd/even key/value pairs to "associate" (merge) into
	1426	the hash-map. Note that the original hash-map is unchanged
	1427	(remember, mal values are immutable), and a new hash-map
	1428	containing the old hash-maps key/values plus the merged key/value
	1429	arguments is returned.
	1430	* `dissoc`: takes a hash-map and a list of keys to remove from the
	1431	hash-map. Again, note that the original hash-map is unchanged and
	1432	a new hash-map with the keys removed is returned. Key arguments
	1433	that do not exist in the hash-map are ignored.
	1434	* `get`: takes a hash-map and a key and returns the value of looking
	1435	up that key in the hash-map. If the key is not found in the
	1436	hash-map then nil is returned.
	1437	* `contains?`: takes a hash-map and a key and returns true (mal true
	1438	value) if the key exists in the hash-map and false (mal false
	1439	value) otherwise.
	1440	* `keys`: takes a hash-map and returns a list (mal list value) of
	1441	all the keys in the hash-map.
	1442	* `vals`: takes a hash-map and returns a list (mal list value) of
	1443	all the values in the hash-map.
	1444	* `sequential?`: takes a single arguments and returns true (mal true
	1445	value) if it is a list or a vector, otherwise returns false (mal
	1446	false value).
	1447
	1448
	1449	<a name="stepA"></a>
	1450
	1451	### Step A: Metadata, Self-hosting and Interop
	1452
	1453	![stepA_mal architecture](stepA_mal.png)
	1454
	1455	You have reached the final step of your mal implementation. This step
	1456	is kind of a catchall for things that did not fit into other steps.
	1457	But most importantly, the changes you make in this step will unlock
	1458	the magical power known as "self-hosting". You might have noticed
	1459	that one of the languages that mal is implemented in is "mal". Any mal
	1460	implementation that is complete enough can run the mal implementation
	1461	of mal. You might need to pull out your hammock and ponder this for
	1462	a while if you have never built a compiler or interpreter before. Look
	1463	at the step source files for the mal implementation of mal (it is not
	1464	cheating now that you have reached step A).
	1465
	1466	If you deferred the implementation of keywords, vectors and hash-maps,
	1467	now is the time to go back and implement them if you want your
	1468	implementation to self-host.
	1469
	1470	Compare the pseudocode for step 9 and step A to get a basic idea of
	1471	the changes that will be made during this step:
	1472	```
	1473	diff -urp ../process/step9_try.txt ../process/stepA_mal.txt
	1474	```
	1475
	1476	* Copy `step9_try.qx` to `stepA_mal.qx`.
	1477
	1478	* Add the `readline` core function. This functions takes a
	1479	string that is used to prompt the user for input. The line of text
	1480	entered by the user is returned as a string. If the user sends an
	1481	end-of-file (usually Ctrl-D), then nil is returned.
	1482
	1483	* Add meta-data support to mal functions. TODO. Should be separate
	1484	from the function macro flag.
	1485
	1486	* Add a new "\host-language\" (symbol) entry to your REPL
	1487	environment. The value of this entry should be a mal string
	1488	containing thename of the current implementation.
	1489
	1490	* When the REPL starts up (as opposed to when it is called with
	1491	a script and/or arguments), call the `rep` function with this string
	1492	to print a startup header:
	1493	"(println (str \"Mal [\" host-language \"]\"))".
	1494
	1495
	1496	Now go to the top level, run the step A tests:
	1497	```
	1498	make "test^quux^stepA"
	1499	```
	1500
	1501	Once you have passed all the non-optional step A tests, it is time to
	1502	try self-hosting. Run your step A implementation as normal, but use
	1503	the file argument mode you added in step 6 to run a each of the step
	1504	from the mal implementation:
	1505	```
	1506	./stepA_mal.qx ../mal/step1_read_print.mal
	1507	./stepA_mal.qx ../mal/step2_eval.mal
	1508	...
	1509	./stepA_mal.qx ../mal/step9_try.mal
	1510	./stepA_mal.qx ../mal/stepA_mal.mal
	1511	```
	1512
	1513	There is a very good chance that you will encounter an error at some
	1514	point while trying to run the mal in mal implementation steps above.
	1515	Debugging failures that happen while self-hosting is MUCH more
	1516	difficult and mind bending. One of the best approaches I have
	1517	personally found is to add prn statements to the mal implementation
	1518	step (not your own implementation of mal) that is causing problems.
	1519
	1520	Another approach I have frequently used is to pull out the code from
	1521	the mal implementation that is causing the problem and simplify it
	1522	step by step until you have a simple piece of mal code that still
	1523	reproduces the problem. Once the reproducer is simple enough you will
	1524	probably know where in your own implementation that problem is likely
	1525	to be. Please add your simple reproducer as a test case so that future
	1526	implementers will fix similar issues in their code before they get to
	1527	self-hosting when it is much more difficult to track down and fix.
	1528
	1529	Once you can manually run all the self-hosted steps, it is time to run
	1530	all the tests in self-hosted mode:
	1531	```
	1532	make MAL_IMPL=quux "test^mal"
	1533	```
	1534
	1535	When you run into problems (which you almost certainly will), use the
	1536	same process described above to debug them.
	1537
	1538	Congratulations!!! When all the tests pass, you should pause for
	1539	a moment and consider what you have accomplished. You have implemented
	1540	a Lisp interpreter that is powerful and complete enough to run a large
	1541	mal program which is itself an implementation of the mal language. You
	1542	might even be asking if you can continue the "inception" by using your
	1543	implementation to run a mal implementation which itself runs the mal
	1544	implementation.
	1545
	1546
	1547	#### Optional: gensym
	1548
	1549	The `or` macro we introduced at step 8 has a bug. It defines a
	1550	variable called `or_FIXME`, which "shadows" such a binding from the
	1551	user's code (which uses the macro). If a user has a variable called
	1552	`or_FIXME`, it cannot be used as an `or` macro argument. In order to
	1553	fix that, we'll introduce `gensym`: a function which returns a symbol
	1554	which was never used before anywhere in the program. This is also an
	1555	example for the use of mal atoms to keep state (the state here being
	1556	the number of symbols produced by `gensym` so far).
	1557
	1558	Previously you used `rep` to define the `or` macro. Remove that
	1559	definition and use `rep` to define the new counter, `gensym` function
	1560	and the clean `or` macro. Here are the string arguments you need to
	1561	pass to `rep`:
	1562	```
	1563	"(def! gensym-counter (atom 0))"
	1564
	1565	"(def! gensym (fn* [] (symbol (str \"G__\" (swap! gensym-counter (fn* [x] (+ 1 x)))))))"
	1566
	1567	"(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) (let* (condvar (gensym)) `(let* (~condvar ~(first xs)) (if ~condvar ~condvar (or ~@(rest xs)))))))))"
	1568	```
	1569
	1570	For extra information read [Peter Seibel's thorough discussion about
	1571	`gensym` and leaking macros in Common Lisp](http://www.gigamonkeys.com/book/macros-defining-your-own.html#plugging-the-leaks).
	1572
	1573
	1574	#### Optional additions
	1575
	1576	* Add metadata support to composite data types, symbols and native
	1577	functions. TODO
	1578	* Add the following new core functions:
	1579	* `time-ms`: takes no arguments and returns the number of
	1580	milliseconds since epoch (00:00:00 UTC January 1, 1970), or, if
	1581	not possible, since another point in time (`time-ms` is usually
	1582	used relatively to measure time durations). After `time-ms` is
	1583	implemented, you can run the mal implementation performance
	1584	benchmarks by running `make perf^quux`.
	1585	* `conj`: takes a collection and one or more elements as arguments
	1586	and returns a new collection which includes the original
	1587	collection and the new elements. If the collection is a list, a
	1588	new list is returned with the elements inserted at the start of
	1589	the given list in opposite order; if the collection is a vector, a
	1590	new vector is returned with the elements added to the end of the
	1591	given vector.
	1592	* `string?`: returns true if the parameter is a string.
	1593	* `seq`: takes a list, vector, string, or nil. If an empty list,
	1594	empty vector, or empty string ("") is passed in then nil is
	1595	returned. Otherwise, a list is returned unchanged, a vector is
	1596	converted into a list, and a string is converted to a list that
	1597	containing the original string split into single character
	1598	strings.
	1599
	1600
	1601	## TODO:
	1602
	1603	* simplify: "X argument (list element Y)" -> ast[Y]
	1604	* list of types with metadata: list, vector, hash-map, mal functions
	1605	* more clarity about when to peek and poke in read_list and read_form
	1606	* tokenizer: use first group rather than whole match (to eliminate
	1607	whitespace/commas)