guide.md: step9,A stubs. TODO. More vector tests.
[jackhill/mal.git] / process / guide.md
CommitLineData
0f4ca9d1
JM
1# The Make-A-Lisp Process
2
3So you want to write a Lisp interpreter? Welcome!
4
5The goal of the Make-A-Lisp project is to make it easy to write your
6own Lisp interpreter without sacrificing those many "Aha!" moments
7that come from ascending the McCarthy mountain. When you reach the peak
8of this particular mountain, you will have an interpreter for the mal
9Lisp language that is powerful enough to be self-hosting, meaning it
10will be able to run a mal interpreter written in mal itself.
11
12So jump right in (er ... start the climb)!
13
14## Pick a language
15
16You might already have a language in mind that you want to use.
17Technically speaking, mal can be implemented in any sufficiently
18complete programming (i.e. Turing complete), however, there are a few
19language features that can make the task MUCH easier. Here are some of
20them in rough order of importance:
21
22* A sequential compound data structure (e.g. arrays, lists,
23 vectors, etc)
24* An associative compound data structure (e.g. a dictionary,
25 hash-map, associative array, etc)
26* Function references (first class functions, function pointers,
27 etc)
28* Real exception handling (try/catch, raise, throw, etc)
29* Variable argument functions (variadic, var args, splats, apply, etc)
30* Function closures
31* PCRE regular expressions
32
33In addition, the following will make your task especially easy:
34
bf518367
JM
35* Dynamic typing / boxed types (specifically, the ability to store
36 different data types in the sequential and associative structures
37 and the language keeps track of the type for you)
0f4ca9d1
JM
38* Compound data types support arbitrary runtime "hidden" data
39 (metadata, metatables, dynamic fields attributes)
40
41Here are some examples of languages that have all of the above
42features: JavaScript, Ruby, Python, Lua, R, Clojure.
43
44Many of the most popular languages already have Mal implementations.
45However, this should not discourage you from creating your own
46implementation in a language that already has one. However, if you go
47this route, I suggest you avoid referring to the existing
48implementations (i.e. "cheating") to maximize your learning experience
49instead of just borrowing mine. On the other hand, if your goal is to
50add new implementations to mal as efficiently as possible, then you
51SHOULD find the most similar target language implementation and refer
52to it frequently.
53
bf518367
JM
54If you want a fairly long list of programming languages with an
55approximate measure of popularity, try the [Programming Language
56Popularity Chart](http://langpop.corger.nl/)
57
0f4ca9d1
JM
58
59## Getting started
60
61* Install your chosen language interpreter/compiler, language package
62 manager and build tools (if applicable)
63
64* Fork the mal repository on github and then clone your forked
65 repository:
66```
67git clone git@github.com:YOUR_NAME/mal.git
68cd mal
69```
70
71* Make a new directory for your implementation. For example, if you
72* language is called "quux":
73```
74mkdir quux
75```
76
77* Modify the top level Makefile to allow the tests to be run against
78 your implementation. For example, if your language is named "quux"
79 and uses "qx" as the file extension, then make the following
80 3 modifications to Makefile:
81```
82IMPLS = ... quux ...
83...
84quux_STEP_TO_PROG = mylang/$($(1)).qx
85...
86quux_RUNSTEP = ../$(2) $(3)
87```
88
89This allows you to run tests against your implementation like this:
90```
91make test^quux^stepX
92```
93
94
95## General hints
96
97Stackoverflow and Google are your best friends. Modern polyglot
98developers do not memorize dozens of programming languages. Instead,
99they learn the peculiar terminology used with each language and then
100use this to search for their answers.
101
102Here are some other resources where multiple languages are
103compared/described:
104* http://learnxinyminutes.com/
105* http://hyperpolyglot.org/
106* http://rosettacode.org/
107
108Do not let yourself be bogged down by specific problems. While the
109make-a-lisp process is structured as a series of steps, the reality is
110that building a lisp interpreter is more like a branching tree. If you
111get stuck on tail call optimization, or hash-maps, move on to other
112things. You will often have a stroke of inspiration for a problem as
113you work through other functionality. I have tried to structure this
114guide and the tests to make clear which things are optional or can be
115deferred until later.
116
117An aside on optional bits: when you run the tests for a given step,
118the last tests are often marked with an "optional" header. This
119indicates that these are tests for functionality that is not critical
120to finish a basic mal implementation. Many of the steps in this
121process guide also have an "Optional" section, however, it is not
122quite the same meaning. Those sections do include the functionality
123that is marked as optional in the tests, but they also include
124functionality that becomes mandatory at a later step. In other words,
125this is a "make your own Lisp adventure".
126
127Use test driven development. Each step of the make-a-lisp process has
128a bunch of tests associated with it and there is an easy script to run
129all the tests for a specific step in the process. Pick a failing test,
130fix it, repeat until all the tests for that step pass.
131
132The `process` directory contains abbreviated pseudocode and
133architecture images for each step of the make-a-lisp process. Use
134a textual diff/comparison tool to compare the previous pseudocode step
135with the one you are working on. The architecture images have changes
136from the previous step highlighted in red.
137
138If you get stuck, find the same step or functionality in a different
139implementation language.
140
141
142## The Make-A-Lisp Process
143
144In the steps that follow the name of the target language is "quux" and
145the file extension for that language is "qx".
146
147
148<a name="step0"></a>
149
150### Step 0: The REPL
151
152![step0_repl architecture](step0_repl.png)
153
154This step is basically just creating a skeleton of your interpreter.
155
156* Create a `step0_repl.qx` file in `quux/`.
157
158* Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep`
159 (read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just
160 stubs that return their first parameter (a string if your target
161 language is a statically typed) and `rep` calls them in order
162 passing the return to the input of the next.
163
164* Add a main loop that repeatedly prints a prompt, gets a line of
165 input from the user, calls `rep` with that line of input, and then
166 prints out the result from `rep`. It should also exit when you send
167 it an EOF (often Ctrl-D).
168
169* If you are using a compiled (ahead-of-time rather than just-in-time)
170 language, then create a Makefile (or appropriate project definition
171 file) in your directory.
172
173Run your new program and make sure that it echos each line that you
174type. Because step0 is so trivial, there are no automated tests to run
175for it.
176
177Add and then commit your new `step0_repl.qx` and `Makefile` to git.
178
179Congratulations! You have just completed the first step of the
180make-a-lisp process.
181
182
183#### Optional:
184
185* Add full line editing and command history support to your
186 interpreter REPL. Many languages have a library/module that provide
187 line editing support. Another option if your language supports it is
188 to use an FFI (foreign function interface) to load and call directly
189 into GNU readline, editline, or libnoise library. Add line
190 editing interface code to `readline.qx`
191
192
193<a name="step1"></a>
194
195### Step 1: Read and Print
196
197![step1_read_print architecture](step1_read_print.png)
198
199In this step, your interpreter will "read" the string from the user
200and parse it into an internal tree data structure (an abstract syntax
201tree) and then take that data structure and "print" it back to
202a string.
203
204In non-lisp languages, this step (called "lexing and parsing") can be
205one of the most complicated parts of the compiler/interpreter. In
206Lisp, the data structure that you want in memory is basically
207represented directly in the code that the programmer writes
208(homoiconicity).
209
210For example, if the string is "(+ 2 (* 3 4))" then the read function
211will process this into a tree structure that looks like this:
212```
213 List
214 / | \
215 / | \
216 / | \
217 Sym:+ Int:2 List
218 / | \
219 / | \
220 / | \
221 Sym:* Int:3 Int:4
222```
223
224Each left paren and its matching right paren (lisp "sexpr") becomes
225a node in the tree and everything else becomes a leaf in the tree.
226
227If you can find code for an implementation of a JSON encoder/decoder
228in your target language then you can probably just borrow and modify
229that and be 75% of the way done with this step.
230
231The rest of this section is going to assume that you are not starting
232from an existing JSON encoder/decoder, but that you do have access to
233a Perl compatible regular expressions (PCRE) module/library. You can
234certainly implement the reader using simple string operations, but it
235is more involved. The `make`, `ps` (postscript) and Haskell
236implementations have examples of a reader/parser without using regular
237expression support.
238
239* Copy `step0_repl.qx` to `step1_read_print.qx`.
240
241* Add a `reader.qx` file to hold functions related to the reader.
242
243* If the target language has objects types (OOP), then the next step
244 is to create a simple stateful Reader object in `reader.qx`. This
245 object will store the tokens and a position. The Reader object will
246 have two methods: `next` and `peek`. `next` returns the tokens at
247 the current position and increments the position. `peek` just
248 returns the token at the current position.
249
250* Add a function `read_str` in `reader.qx`. This function
251 will call `tokenizer` and then create a new Reader object instance
252 with the tokens. Then it will call `read_form` with the Reader
253 instance.
254
255* Add a function `tokenizer` in `reader.qx`. This function will take
256 a single single string and return an array/list
257 of all the tokens (strings) in it. The following regular expression
258 (PCRE) will match all mal tokens.
259```
260[\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"|;.*|[^\s\[\]{}('"`,;)]*)
261```
262
263* Add the function `read_form` to `reader.qx`. This function
264 will peek at the first token in the Reader object and switch on the
265 first character of that token. If the character is a left paren then
266 `read_list` is called with the Reader object. Otherwise, `read_atom`
267 is called with the Reader Object. The return value from `read_form`
268 is a mal data type. If your target language is statically typed then
269 you will need some way for `read_form` to return a variant or
270 subclass type. For example, if your language is object oriented,
271 then you cal define a top level MalType (in `types.qx`) that all
272 your mal data types inherit from. The MalList type (which also
273 inherits from MalType) will contains a list/array of other MalTypes.
274 If your language is dynamically typed then you can likely just
275 return a plain list/array of other mal types.
276
277* Add the function `read_list` to `reader.qx`. This function will
278 repeatedly call `read_form` with the Reader object until it
279 encounters a ')' token (if it reach EOF before reading a ')' then
280 that is an error). It accumulates the results into a List type. If
281 your language does not have a sequential data type that can hold mal
282 type values you may need to implement one (in `types.qx`). Note
283 that `read_list` repeatedly calls `read_form` rather than
284 `read_atom`. This mutually recursive defintion between `read_list`
285 and `read_form` is what allows lists to contain lists.
286
287* Add the function `read_atom` to `reader.qx`. This function will
288 look at the contents of the token and return the appropriate scalar
289 (simple/single) data type value. Initially, you can just implement
290 numbers (integers) and symbols . This will allow you to proceed
291 through the next couple of steps before you will need to implement
292 the other fundamental mal types: nil, true, false, and string. The
293 remaining mal types: keyword, vector, hash-map, and atom do not
294 need to be implemented until step 9 (but can be implemented at any
295 point between this step and that). BTW, symbols types are just an
296 object that contains a single string name value (some languages have
297 symbol types already).
298
299* Add a file `printer.qx`. This file will contain a single function
300 `pr_str` which does the opposite of `read_str`: take a mal data
301 structure and return a string representation of it. But `pr_str` is
302 much simpler and is basically just a switch statement on the type of
303 the input object:
304
305 * symbol: return the string name of the symbol
306 * number: return the number as a string
307 * list: iterate through each element of the list calling `pr_str` on
308 it, then join the results with a space separator, and surround the
309 final result with parens
310
311* Change the `READ` function in `step1_read_print.qx` to call
312 `reader.read_str` and the `PRINT` function to call `printer.pr_str`.
313 `EVAL` continues to simply return its input but the type is now
314 a mal data type.
315
316You now have enough hooked up to begin testing your code. You can
317manually try some simple inputs:
318 * `123` -> `123`
319 * ` 123 ` -> `123`
320 * `abc` -> `abc`
321 * ` abc ` -> `abc`
322 * `(123 456)` -> `(123 456)`
323 * `( 123 456 789 ) ` -> `(123 456 789)`
324 * `( + 2 (+ 3 4) ) ` -> `(+ 2 (* 3 4))`
325
326To verify that your code is doing more than just eliminating extra
327spaces (and not failing), you can instrument your `reader.qx` functions.
328
329Once you have gotten past those simple manual tests, it is time to run
330the full suite of step 1 tests. Go to the top level and run the
331following:
332```
333make test^quux^step1
334```
335
336Fix any test failures related to symbols, numbers and lists.
337
338Depending on the functionality of your target language, it is likely
339that you have now just completed one of the most difficult steps. It
340is down hill from here. The remaining steps will probably be easier
341and each step will give progressively more bang for the buck.
342
343#### Optional:
344
345
346* Add error checking to your reader functions to make sure parens
347 are properly matched. Catch and print these errors in your main
348 loop. If your language does not have try/catch style bubble up
349 exception handling, then you will need to add explicit error
350 handling to your code to catch and pass on errors without crashing.
351
352* Add support for the other basic data type to your reader and printer
353 functions: string, nil, true, and false. These become mandatory at
354 step 4. When a string is read, a slash followed by a doublequote is
355 translatd into a plain doublequote character and a slash followed by
356 "n" is translated into a newline. To properly print a string (for
357 step 4 string functions), the `pr_str` function needs another
358 parameter called `print_readably`. When `print_readably` is true,
359 doublequotes and newlines are translated into their printed
360 representations (the reverse of the reader). The `PRINT` function in
361 the main program should call `pr_str` with print_readably set to
362 true.
363
364* Add support for the other mal types: keyword, vector, hash-map, and
365 atom. TODO/TBD
366 * keyword: just a string stored with unicode prefix (or char 127 if
367 no unicode support).
368 * vector: can be implemented with same underlying type as list if
369 there is some mechanism for marking/distringuishing from a list.
370 * hash-map: only need to implement string keys (which enables
371 keyword keys since they are just special strings).
372
373* Add support for reader macros which are special forms that are
374 transformed into other forms during the read phase.
375
376* Add comment support to your reader. The tokenizer should ignore
377 tokens that start with ";". Your `read_str` function will need to
378 properly handle when the tokenizer returns no values. The simplest
379 way to do this is to return `nil` mal value. A cleaner option (that
380 does not print `nil` at the prompt is to throw a special exception
381 that causes the main loop to simply continue at the beginning of the
382 loop without calling `rep`.
383
384
385<a name="step2"></a>
386
387### Step 2: Eval
388
389![step2_eval architecture](step2_eval.png)
390
391In step 1 your mal interpreter was basically just a way to validate
392input and eliminate extraneous white space. In this step you will turn
393your interpreter into a simple number calculator by adding
394functionality to the evaluator (`EVAL`).
395
396Compare the pseudocode for step 1 and step 2 to get a basic idea of
397the changes that will be made during this step:
398```
399diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt
400```
401
402* Copy `step1_read_print.qx` to `step2_eval.qx`.
403
404* Define a simple initial REPL environment. This environment is an
405 associative structure that maps symbols (or symbol names) to
406 numeric functions. For example, in python this would look something
407 like this:
408```
409repl_env = {'+': lambda a,b: a+b,
410 '-': lambda a,b: a-b,
411 '*': lambda a,b: a*b,
412 '/': lambda a,b: int(a/b)}
413```
414
415* Modify the `rep` function to pass the REPL environment as the second
416 parameter for the `EVAL` call.
417
418* Create a new function `eval_ast` which takes `ast` (mal data type)
419 and an associative structure (the environment from above).
420 `eval_ast` switches on the type of `ast` as follows:
421
422 * symbol: lookup the symbol in the environment structure and return
423 the value or raise an error no value is found
424 * list: return a new list that is the result of calling `EVAL` on
425 each of the members of the list
426 * otherwise just return the original `ast` value
427
428* Modify `EVAL` to check if the first parameter `ast` is a list.
429 * `ast` is not a list: then return the result of calling `eval_ast`
430 on it.
431 * `ast` is a list: call `eval_ast` to get a new evaluated list. Take
432 the first item of the evaluated list and call it as function using
433 the rest of the evaluated list as its arguments.
434
435If your target language does not have full variable length argument
436support (e.g. variadic, vararg, splats, apply) then you will need to
437pass the full list of arguments as a single parameter and split apart
438the individual values inside of every mal function. This is annoying,
439but workable.
440
441The process of taking a list and invoking or executing it to return
442something new is known in Lisp as the "apply" phase.
443
444Try some simple expressions:
445
446 * `(+ 2 3)` -> `5`
447 * `(+ 2 (* 3 4))` -> `14`
448
449The most likely challenge you will encounter is how to properly call
450a function references using an arguments list.
451
452Now go to the top level, run the step 2 tests and fix the errors.
453```
454make test^quux^step2
455```
456
457You now have a simple prefix notation calculator!
458
459
460<a name="step3"></a>
461
462### Step 3: Environments
463
464![step3_env architecture](step3_env.png)
465
466In step 2 you were already introduced to REPL environment (`repl_env`)
467where the basic numeric functions were stored and looked up. In this
468step you will add the ability to create new environments (`let*`) and
469modify exiting environments (`def!`).
470
471A Lisp environment is an associative data structure that maps symbols (the
472keys) to values. But Lisp environments have an additional important
473function: they can refer to another environment (the outer
474environment). During environment lookups, if the current environment
475does not have the symbol, the lookup continues in the outer
476environment, and continues this way until the symbol is either found,
477or the outer environment is `nil` (the outermost environment in the
478chain).
479
480Compare the pseudocode for step 2 and step 3 to get a basic idea of
481the changes that will be made during this step:
482```
483diff -urp ../process/step2_eval.txt ../process/step3_env.txt
484```
485
486* Copy `step2_eval.qx` to `step2_env.qx`.
487
488* Create `env.qx` to hold the environment definition.
489
490* Define an `Env` object that is instantiated with a single `outer`
491 parameter and starts with an empty associative data structure
492 property `data`.
493
494* Define three methods for the Env object:
495 * set: takes a symbol key and a mal value and adds to the `data`
496 structure
497 * find: takes a symbol key and if the current environment contains
498 that key then return the environment. If no key is found and outer
499 is not `nil` then call find (recurse) on the outer environment.
500 * get: takes a symbol key and uses the `find` method to locate the
501 environment with the key, then returns the matching value. If no
502 key is found up the outer chain, then throws/raises a "not found"
503 error.
504
505* Update `step2_env.qx` to use the new `Env` type to create the
506 repl_env (with a `nil` outer value) and use the `set` method to add
507 the numeric functions.
508
509* Modify `eval_ast` to call the `get` method on the `env` parameter.
510
511* Modify the apply section of `EVAL` to switch on the first element of
512 the list:
513 * symbol "def!": call the set method of the current environment
514 (second parameter of `EVAL` called `env`) using the unevaluated
515 first parameter (second list element) as the symbol key and the
516 evaluated second parameter as the value.
517 * symbol "let*": create a new environment using the current
518 environment as the outer value and then use the first parameter as
519 a list of new bindings in the "let" environment. Take the second
520 element of the binding list, call `EVAL` using the new "let*"
521 environment as the evaluation environment, then call `set` on the
522 "let" environment using the first binding list element as the key
523 and the evaluated second element as the value. This is repeated
524 for each odd/even pair in the binding list. Note in particular,
525 the bindings earlier in the list can be referred to by later
526 bindings. Finally, the second parameter (third element) of the
527 original `let*` form is evaluated using the new "let*" environment
528 and the result is returned as the result of the `let*` (the new
529 let environment is discarded upon completion).
530 * otherwise: call `eval_ast` on the list and apply the first element
531 to the rest as before.
532
533`def!` and `let*` are Lisp "specials" (or "special atoms") which means
534that they are language level features and more specifically that the
535rest of the list elements (arguments) may be evaluated differently (or
536not at all) unlike the default apply case where all elements of the
537list are evaluated before the first element is invoked. Lists which
538contain a "special" as the first element are known as "special forms".
539The are special because the follow special evaluation rules.
540
541Try some simple environment tests:
542
543 * `(def! a 6)` -> `6`
544 * `a` -> `6`
545 * `(def! b (+ a 2))` -> `8`
546 * `(+ a b)` -> `14`
547 * `(let* (c 2) c)` -> `2`
548
549Now go to the top level, run the step 3 tests and fix the errors.
550```
551make test^quux^step3
552```
553
554You mal implementation is still basically just a numeric calculator
555with save/restore capability. But you have set the foundation for step
5564 where it will begin to feel like a real programming language.
557
558
559An aside on mutation and typing:
560
561The "!" suffix on symbols is used to indicate that this symbol refers
562to a function that mutates something else. In this case, the `def!`
563symbol indicates a special form that will mutate the current
564environment. Many (maybe even most) of runtime problems that are
565encountered in software engineering are a result of mutation. By
566clearly marking code where mutation may occur, you can more easily
567track down the likely cause of runtime problems when they do occur.
568
569Another cause of runtime errors is type errors, where a value of one
570type is unexpectedly treated by the program as a different and
571incompatible type. Statically typed languages try to make the
572programmer solve all type problems before the program is allowed to
573run. Most Lisp variants tend to be dynamically typed (types of values
574are checked when they are actually used at runtime).
575
576As an aside-aside: The great debate between static and dynamic typing
577debate can be understood by following the money. Advocates of strict
578static typing use words like "correctness" and "safety" and thus get
579government and academic funding. Advocates of dynamic typing use words
580like "agile" and "time-to-market" and thus get venture capital and
581commercial funding.
582
583
584<a name="step4"></a>
585
586### Step 4: If Fn Do
587
588![step4_if_fn_do architecture](step4_if_fn_do.png)
589
590In step 3 you added environments and the special forms for
591manipulating environments. In this step you will add 3 new special
592forms (`if`, `fn*` and `do`) and add several more core functions to
593the default REPL environment. Our new architecture will look like
594this:
595
596The `fn*` special form is how new user-defined functions are created.
597In some Lisps, this special form is named "lambda".
598
599Compare the pseudocode for step 3 and step 4 to get a basic idea of
600the changes that will be made during this step:
601```
602diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt
603```
604
605* Copy `step3_env.qx` to `step4_if_fn_do.qx`.
606
607* If you have not implemented reader and printer support (and data
608 types) for `nil`, `true` and `false`, you will need to do so for
609 this step.
610
611* Update the constructor/initializer for environments to take two new
612 arguments: `binds` and `exprs`. Bind (`set`) each element (symbol)
613 of the binds list to the respective element of the `exprs` list.
614
615* Add support to `printer.qx` to print functions values. A string
616 literal like "#<function>" is sufficient.
617
618* Add the following special forms to `EVAL`.
619
620 * `do`: Evaluate the all the elements of the list and return the
621 final element (evaluated).
622 * `if`: Evaluate the first parameter (second element). If the result
623 (condition) is anything other than `nil` or `false`, then evaluate
624 the second parammeter (third element of the list) and return the
625 result. Otherwise, evaluate the third parameter (fourth element)
626 and return the result. If condition is false and there is no third
627 parameter, then just return `nil`.
628 * `fn*`: Return a new function closure. The body of that closure
629 does the following:
630 * Create a new environment using `env` (closed over from outer
631 scope) as the `outer` parameter, the first parameter (second
632 list element of `ast` from the outer scope) as the `binds`
633 parameter, and the parameters to the closure as the `exprs`
634 parameter.
635 * Call `EVAL` on the second parameter (third list element of `ast`
636 from outer scope), using the new environment. Use the result as
637 the return value of the closure.
638
639If your target language does not support closures, then you will need
640to implement `fn*` using some sort of structure or object that stores
641the values being closed over: the first and second elements of the
642`ast` list (function parameter list and function body) and the current
643environment `env`. In this case, your native functions will need to be
644wrapped in the same way. You will probably also need a method/function
645that invokes your function object/structure for the default case of
646the apply section of `EVAL`.
647
648Try out the basic functionality you have implemented:
649
650 * `(fn* [a] a)` -> `#<function>`
651 * `( (fn* [a] a) 7)` -> `7`
652 * `( (fn* [a] (+ a 1)) 10)` -> `11`
653 * `( (fn* [a b] (+ a b)) 2 3)` -> `5`
654
655* Add a new file `core.qx` and define an associative data structure
656 `ns` (namespace) that maps symbols to functions. Move the numeric
657 function definitions into this structure.
658
659* Modify `step4_if_fn_do.qx` to iterate through the `core.ns`
660 structure and add (`set`) each symbol/function mapping to the
661 REPL environment (`repl_env`).
662
663* Add the following functions to `core.ns`:
664 * `list`: take the parameters and return them as a list.
665 * `list?`: return true if the first parameter is a list, false
666 otherwise.
667 * `empty?`: treat the first parameter as a list and return true if
668 the list is empty and false if it contains any elements.
669 * `count`: treat the first parameter as a list and return the number
670 of elements that it contains.
671 * `=`: compare the first two parameters and return true if they are
672 the same type and contain the same value. In the case of equal
673 length lists, each element of the list should be compared for
674 equality and if they are the same return true, otherwise false.
675 * `<`, `<=`, `>`, and `>=`: treat the first two parameters as
676 numbers and do the corresponding numeric comparison, returning
677 either true or false.
678
679Now go to the top level, run the step 4 tests. There are a lot of
680tests in step 4 but all of the non-optional tests that do not involve
681strings should be able to pass now.
682
683```
684make test^quux^step4
685```
686
687Your mal implementation is already beginning to look like a real
688language. You have flow control, conditionals, user-defined functions
689with lexical scope, side-effects (if you implement the string
690functions), etc. However, our little interpreter has not quite reach
691Lisp-ness yet. The next several steps will take
692
693#### Optional:
694
695* Implement Clojure-style variadic function parameters. Modify the
696 constructor/initializer for environments, so that if a "&" symbol is
697 encountered in the `binds` list, the next symbol in the `binds` list
698 after the "&" is bound to the rest of the `exprs` list that has not
699 been bound yet.
700
701* Defines a `not` function using mal itself. In `step4_if_fn_do.qx`
702 call the `rep` function with this string:
703 "(def! not (fn* (a) (if a false true)))".
704
705* Implement the strings functions in `core.qx`. To implement these
706 functions, you will need to implement the string support in the
707 reader and printer (optional section of step 1). Each of the string
708 functions takes multiple mal values, prints them (`pr_str`) and
709 joins them together into a new string.
710 * `pr-str`: calls `pr_str` on each argument with `print_readably`
711 set to true, joins the results with " " and returns the new
712 string.
713 * `str`: calls `pr_str` on each argument with `print_readably` set
714 to false, concatenates the results together ("" separator), and
715 returns the new string.
716 * `prn`: calls `pr_str` on each argument with `print_readably` set
717 to true, joins the results with " ", prints the string to the
718 screen and then returns `nil`.
719 * `println`: calls `pr_str` on each argument with `print_readably` set
720 to false, joins the results with " ", prints the string to the
721 screen and then returns `nil`.
722
723
724<a name="step5"></a>
725
726### Step 5: Tail call optimization
727
728![step5_tco architecture](step5_tco.png)
729
730In step 4 you added special forms `do`, `if` and `fn*` and you defined
731some core functions. In this step you will add a Lisp feature called
732tail call optimization (TCO). Also called "tail recursion" or
733sometimes just "tail calls".
734
735Several of the special forms that you have defined in `EVAL` end up
736calling back into `EVAL`. For those forms that call `EVAL` as the last
737thing that they do before returning (tail call) you will just loop back
738to the beginning of eval rather than calling it again. The advantage
739of this approach is that it avoids adding more frames to the call
740stack. This is especially important in Lisp languages because they do
741not tend to have iteration control structures preferring recursion
742instead. However, with tail call optimization, recursion can be made
743as stack efficient as iteration.
744
745Compare the pseudocode for step 4 and step 5 to get a basic idea of
746the changes that will be made during this step:
747```
748diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt
749```
750
751* Copy `step4_env.qx` to `step5_tco.qx`.
752
753* Add a loop (e.g. while true) around all code in `EVAL`.
754
755* Modify each of the following form cases to add tail call recursion
756 support:
757 * `let*`: remove the final `EVAL` call on the second `ast` argument
758 (third list element). Set `env` (i.e. the local variable passed in
759 as second parameter of `EVAL`) to the new let environment. Set
760 `ast` (i.e. the local variable passed in as first parameter of
761 `EVAL`) to be the second `ast` argument. Continue at the beginning
762 of the loop (no return).
763 * `do`: change the `eval_ast` call to evaluate all the parameters
764 the except for the last (2nd list element up to but not
765 including last). Set `ast` to the last element of `ast`. Continue
766 at the beginning of the loop (`env` stays unchanged).
767 * `if`: the condition continues to be evaluated, however, rather
768 than evaluating the true or false branch, `ast` is set to the
769 unevaluated value of the chosen branch. Continue at the beginning
770 of the loop (`env` is unchanged).
771
772* The return value from the `fn*` special form will now become an
773 object/structure with attributes that allow the default invoke case
774 of `EVAL` to do TCO on mal functions. Those attributes are:
775 * `fn`: the original function value return in step 4
776 * `ast`: the second `ast` argument (third list element) representing
777 the body of the function.
778 * `params`: the first `ast` argument (second list element)
779 representing the parameter names of the function.
780 * `env`: the current value of the `env` parameter of `EVAL`.
781
782* The default "apply"/invoke case of `EVAL` must now be changed to
783 account for the new object/structure returned by the `fn*` form.
784 Continue to call `eval_ast` on `ast`. The first element is `f`.
785 Switch on the type of `f`:
786 * regular function (not one defined by `fn*`): apply/invoke it as
787 * before (in step 4).
788 * a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate
789 a new environment using the `env` and `params` attributes of `f`
790 as the `outer` and `binds` arguments and rest `ast` arguments
791 (list elements 2 through the end) as the `exprs` argument. Set
792 `env` to the new environment. Continue at the beginning of the loop.
793
794Run some manual tests from previous steps to make sure you have not
795broken anything by adding TCO.
796
797Now go to the top level, run the step 5 tests.
798
799```
800make test^quux^step5
801```
802
803Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to`
804function cannot be tail call optimized because it does something after
805the recursive call (`sum-to` calls itself and then does the addition).
806Lispers say that the `sum-to` is not in tail position. The `sum2`
807function however, calls itself from tail position. In other words, the
808recursive call to `sum2` is the last action that `sum2` does. Calling
809`sum-to` with a large value will cause a stack overflow exception in
810most target languages (some have super-special tricks they use to
811avoid stack overflows).
812
813Congratulations, your mal implementation already has a feature (TCO)
814that most mainstream languages lack.
815
816
817<a name="step6"></a>
818
819### Step 6: Files and Evil
820
821![step6_file architecture](step6_file.png)
822
823In step 5 you added tail call optimization. In this step you will add
824some string and file operations and give your implementation a touch
825of evil ... er, eval. And as long as your language supports function
826closures, this step will be quite simple. However, to complete this
827step, you must implement string type support, so if you have been
828holding off on that you will need to go back and do so.
829
830Compare the pseudocode for step 5 and step 6 to get a basic idea of
831the changes that will be made during this step:
832```
833diff -urp ../process/step5_tco.txt ../process/step6_file.txt
834```
835
ffd31966
JM
836* Copy `step5_tco.qx` to `step6_file.qx`.
837
0f4ca9d1
JM
838* Add two new string functions to the core namespaces:
839 * `read-string`: this function just exposes the `read_str` function
840 from the reader. If your mal string type is not the same as your
841 target language (e.g. statically typed language) then your
842 `read-string` function will need to unbox (extract) the raw string
843 from the mal string type in order to call `read_str`.
844 * `slurp`: this function takes a file name (string) and returns the
845 contents of the file as a string. Once again, if your mal string
846 type wraps a raw target language string, then you will need to
847 unmarshall (extract) the string parameter to get the raw file name
848 string and marshall (wrap) the result back to a mal string type.
849
850* In your main program, add a new `eval` (symbol) entry to your REPL
851 environment. The value of the new entry is a regular function
852 closure with a single argument `ast`. The closure calls the real
853 `EVAL` function using the `ast` as the first argument and the REPL
854 environment (closed over from outside) as the second argument. The
855 result of the `EVAL` call is returned.
856
857* Define a `load-file` function using mal itself. In your main
858 program call the `rep` function with this string:
859 "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \")\")))))".
860
861Try out `load-file`:
862 * `(load-file "../tests/incA.mal")` -> `9`
863 * `(inc4 3)` -> `7`
864
865The `load-file` function does the following:
866 * Call `slurp` to read in a file by name. Surround the contents with
867 "(do ...)" so that the whole file will be treated as a single
868 program AST (abstract syntax tree).
869 * Call `read-string` on the string returned from `slurp`. This uses
870 the reader to read/convert the file contents into mal data/AST.
871 * Call `eval` (the one in the REPL environment) on the AST returned
872 from `read-string` to "run" it.
873
874Now go to the top level, run the step 6 tests. The optional tests will
875need support from the reader for comments, vectors and hash-maps:
876```
877make test^quux^step6
878```
879
880Congratulations, you now have a full-fledged scripting language that
881can run other mal programs. However, the set of functions that are
882available (from `core.qx`) is fairly limited. The bulk of the
883functions you will add are described in step 9, but you will begin to
884flesh them out over the next few steps to support quoting (step 7) and
885macros (step 8).
886
887
888#### Optional:
889
890* Add the ability to run another mal program from the command line.
891 Prior to the REPL loop, check if your mal implementation is called
892 with command line arguments. If so, treat the first argument as
893 a filename and use `rep` to call `load-file` on that filename, and
894 finally exit/terminate execution.
895
896* Add the rest of the command line arguments to your REPL environment
897 so that programs that are run with `load-file` have access to their
898 calling environmnet. Add a new "*ARGV*" (symbol) entry to your REPL
899 environment. The value of this entry should be the rest of the
900 command line arguments as a mal list value.
901
902
903<a name="step7"></a>
904
905### Step 7: Quoting
906
907![step7_quote architecture](step7_quote.png)
908
909In step 7 you will add the special forms `quote` and `quasiquote` and
910add supporting core functions `cons` and `concat`. The two quote forms
911add a powerful abstraction for manipulating mal code itself
912(meta-programming).
913
0f4ca9d1
JM
914The `quote` special form indicates to the evaluator (`EVAL`) that the
915parameter should not be evaluated (yet). At first glance, this might
916not seem particular useful but an example of what this enables is the
917ability for a mal program to refer to a symbol itself rather than the
918value that it evaluates to. Likewise with lists. For example, consider
919the following:
920
921* `(prn abc)`: this will lookup the symbol `abc` in the current
922 evaluation environment and print it. This will result in error if
923 `abc` is not defined.
924* `(prn (quote abc))`: this will print "abc" (prints the symbol
925 itself). This will work regardless of whether `abc` is defined in
926 the current environment.
927* `(prn (1 2 3))`: this will result in an error because `1` is not
928 a function and cannot be applied to the arguments `(2 3)`.
929* `(prn (quote (1 2 3)))`: this will print "(1 2 3)".
930* `(def! l (quote (1 2 3)))`: list quoting allows us to define lists
931 directly in the code (list literal). Another way of doing this is
932 with the list function: `(def! l (list 1 2 3))`.
933
934The second special quoting form is `quasiquote`. This allows a quoted
935list to have internal elements of the list that are temporarily
936unquoted (normal evaluation). There are two special forms that only
937mean something within a quasiquoted list: `unquote` and
938`splice-unquote`. These are perhaps best explained with some examples:
939
940* `(def! lst (quote (2 3)))` -> `(2 3)`
941* `(quasiquote (1 (unquote lst)))` -> `(1 (2 3))`
942* `(quasiquote (1 (splice-unquote lst)))` -> `(1 2 3)`
943
944The `unquote` form turns evaluation back on for its argument and the
945result of evaluation is put in place into the quasiquoted list. The
946`splice-unquote` also turns evaluation back on for its argument, but
947the evaluated value must be a list which is then "spliced" into the
948quasiquoted list. The true power of the quasiquote form will be
949manifest when it used together with macros (in the next step).
950
ffd31966
JM
951Compare the pseudocode for step 6 and step 7 to get a basic idea of
952the changes that will be made during this step:
953```
954diff -urp ../process/step6_file.txt ../process/step7_quote.txt
955```
956
957* Copy `step6_file.qx` to `step7_quote.qx`.
958
0f4ca9d1
JM
959* Before implementing the quoting forms, you will need to implement
960* some supporting functions in the core namespace:
961 * `cons`: this function takes a list as its second
962 parameter and returns a new list that has the first argument
963 prepended to it.
964 * `concat`: this functions takes 0 or more lists as
965 parameters and returns a new list that is a concatenation of all
966 the list parameters.
967
968An aside on immutability: note that neither cons or concat mutate
969their original list arguments. Any references to them (i.e. other
970lists that they may be "contained" in) will still refer to the
971original unchanged value. Mal, like Clojure, is a language which uses
972immutable data structures. I encourage you to read about the power and
973importance of immutability as implemented in Clojure (from which
974Mal borrows most of its syntax and feature-set).
975
976* Add the `quote` special form. This form just returns its argument
977 (the second list element of `ast`).
978
979* Add the `quasiquote` special form. First implement a helper function
980 `is_pair` that returns true if the parameter is a non-empty list.
981 Then define a `quasiquote` function. This is called from `EVAL` with
982 the first `ast` argument (second list element) and then `ast` is set
983 to the result and execution continues at the top of the loop (TCO).
984 The `quasiquote` function takes a parameter `ast` and has the
985 following conditional:
986 1. if `is_pair` of `ast` is false: return a new list containing:
987 a symbol named "quote" and `ast`.
988 2. else if the first element of `ast` is a symbol named "unquote":
989 return the second element of `ast`.
990 3. if `is_pair` of first element of `ast` is true and the first
991 element of first element of `ast` (`ast[0][0]`) is a symbol named
992 "splice-unquote": return a new list containing: a symbol named
993 "concat", the second element of first element of `ast`
994 (`ast[0][1]`), and the result of calling `quasiquote` with the
995 second through last element of `ast`.
996 4. otherwise: return a new list containing: a symbol named "cons", the
997 result of calling `quasiquote` on first element of `ast`
998 (`ast[0]`), and result of calling `quasiquote` with the second
999 through last element of `ast`.
1000
1001
1002Now go to the top level, run the step 7 tests:
1003```
1004make test^quux^step7
1005```
1006
1007Quoting is one of the more mundane functions available in mal, but do
1008not let that discourage you. Your mal implementation is almost
1009complete, and quoting sets the stage for the next very exiting step:
1010macros.
1011
1012
1013#### Optional
1014
1015* The full names for the quoting forms are fairly verbose. Most Lisp
1016 languages have a short-hand syntax and Mal is no exception. These
1017 short-hand syntaxes are known as reader macros because they allow us
1018 to manipulate mal code during the reader phase. Macros that run
1019 during the eval phase are just called "macros" and are described in
1020 the next section. Expand the conditional with reader `read_form`
1021 function to add the following four cases:
1022 * token is "'" (single quote): return a new list that contains the
1023 symbol "quote" and the result of reading the next form
1024 (`read_form`).
1025 * token is "`" (back-tick): return a new list that contains the
1026 symbol "quasiquote" and the result of reading the next form
1027 (`read_form`).
1028 * token is "~" (tilde): return a new list that contains the
1029 symbol "unquote" and the result of reading the next form
1030 (`read_form`).
1031 * token is "~@" (tilde + at sign): return a new list that contains
1032 the symbol "splice-unquote" and the result of reading the next
1033 form (`read_form`).
1034
1035* Add support for quoting of vectors. The `is_pair` function should
1036 return true if the argument is a non-empty list or vector. `cons`
1037 should also accept a vector as the second argument. The return value
1038 is a list regardless. `concat` should support concatenation of
1039 lists, vectors, or a mix or both. The result is always a list.
1040
1041
1042<a name="step8"></a>
1043
1044### Step 8: Macros
1045
1046![step8_macros architecture](step8_macros.png)
1047
1048Your mal implementation is now ready for one of the most Lispy and
1049exciting of all programming concepts: macros. In the previous step,
1050quoting enabled some simple manipulation data structures and therefore
1051manipulation of mal code (because the `eval` function from step
10526 turns mal data into code). In this step you will be able to mark mal
1053functions as macros which can manipulate mal code before it is
1054evaluated. In other words, macros are user-defined special forms. Or
1055to look at it another way, macros allow mal programs to redefine
1056the mal language itself.
1057
1058Compare the pseudocode for step 7 and step 8 to get a basic idea of
1059the changes that will be made during this step:
1060```
1061diff -urp ../process/step7_quote.txt ../process/step8_macros.txt
1062```
1063
ffd31966
JM
1064* Copy `step7_quote.qx` to `step8_macros.qx`.
1065
1066
0f4ca9d1
JM
1067You might think that the infinite power of macros would require some
1068sort of complex mechanism, but the implementation is actually fairly
1069simple.
1070
1071* Add a new attribute `is_macro` to mal function types. This should
1072 default to false.
1073
1074* Add a new special form `defmacro!`. This is very similar to the
1075 `def!` form, but before the evaluated value (mal function) is set in
1076 the environment, the `is_macro` attribute should be set to true.
1077
1078* Add a `is_macro_call` function: This function takes arguments `ast`
1079 and `env`. It returns true if `ast` is a list that contains a symbol
1080 as the first element and that symbol refers to a function in the
1081 `env` environment and that function has the `is_macro` attribute set
1082 to true. Otherwise, it returns false.
1083
1084* Add a `macroexpand` function: This function takes arguments `ast`
1085 and `env`. It calls `is_macro_call` with `ast` and `env` and loops
1086 while that condition is true. Inside the loop, the first element of
1087 the `ast` list (a symbol), is looked up in the environment to get
1088 the macro function. This macro function is then called/applied with
1089 the rest of the `ast` elements (2nd through the last) as arguments.
1090 The return value of the macro call becomes the new value of `ast`.
1091 When the loop completes because `ast` no longer represents a macro
1092 call, the current value of `ast` is returned.
1093
1094* In the evaluator (`EVAL`) before the special forms switch (apply
1095 section), perform macro expansion by calling the `macroexpand`
1096 function with the current value of `ast` and `env`. Set `ast` to the
1097 result of that call. If the new value of `ast` is no longer a list
1098 after macro expansion, then return `ast`, otherwise continue with
1099 the rest of the apply section (special forms switch).
1100
1101* Add a new special form condition for `macroexpand`. Call the
1102 `macroexpand` function using the first `ast` argument (second list
1103 element) and `env`. Return the result. This special form allows
1104 a mal program to do explicit macro expansion without applying the
1105 result (which can be useful for debugging macro expansion).
1106
1107Now go to the top level, run the step 8 tests:
1108```
1109make test^quux^step8
1110```
1111
1112
1113### Optional
1114
1115* Add the following new core functions which are frequently used in
1116 macro functions:
1117 * `nth`: this function takes a list (or vector) and a number (index)
1118 as arguments, returns the element of the list at the given index.
1119 If the index is out of range, this function raises an exception.
1120 * `first`: this function takes a list (or vector) as its argument
1121 and return the first element. If the list (or vector) is empty or
1122 is `nil` then `nil` is returned.
1123 * `rest`: this function takes a list (or vector) as its argument and
1124 returns a new list containing all the elements except the first.
1125
1126* In the main program, use the `rep` function to define two new
1127 control structures macros. Here are the string arguments for `rep`
1128 to define these macros:
1129 * `cond`: "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))"
1130 * `or`: "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) `(let* (or_FIXME ~(first xs)) (if or_FIXME or_FIXME (or ~@(rest xs))))))))"
1131
1132
ffd31966
JM
1133<a name="step9"></a>
1134
1135### Step 9: Try
1136
1137![step9_try architecture](step9_try.png)
1138
1139Compare the pseudocode for step 8 and step 9 to get a basic idea of
1140the changes that will be made during this step:
1141```
1142diff -urp ../process/step8_macros.txt ../process/step9_try.txt
1143```
1144
1145* Copy `step8_macros.qx` to `step9_try.qx`.
1146
1147* TODO/TBD
1148
1149
1150<a name="step9"></a>
1151
1152### Step A: Interop and Self-hosting
1153
1154![stepA_interop architecture](stepA_interop.png)
1155
1156Compare the pseudocode for step 9 and step A to get a basic idea of
1157the changes that will be made during this step:
1158```
1159diff -urp ../process/step9_try.txt ../process/stepA_interop.txt
1160```
1161
1162* Copy `step9_try.qx` to `stepA_interop.qx`.
1163
1164* TODO/TBD
1165
1166
0f4ca9d1
JM
1167## TODO:
1168
1169* simplify: "X argument (list element Y)" -> ast[Y]
1170* step 8 summary (power of macros, warning about macros, almost to
1171 self-hosting)
1172* step 9
1173* step A
1174* more info on hash-map and keyword implementation. Hash-maps just
1175 need to support string keys.
1176* list of types with metadata: list, vector, hash-map, mal functions
1177* more clarity about when to peek and poke in read_list and read_form
1178* tokenizer: use first group rather than whole match (to eliminate
1179 whitespace/commas)