All: move metadata, atoms, readline, conj to stepA.
[jackhill/mal.git] / process / guide.md
CommitLineData
0f4ca9d1
JM
1# The Make-A-Lisp Process
2
3So you want to write a Lisp interpreter? Welcome!
4
5The goal of the Make-A-Lisp project is to make it easy to write your
6own Lisp interpreter without sacrificing those many "Aha!" moments
7that come from ascending the McCarthy mountain. When you reach the peak
8of this particular mountain, you will have an interpreter for the mal
9Lisp language that is powerful enough to be self-hosting, meaning it
10will be able to run a mal interpreter written in mal itself.
11
12So jump right in (er ... start the climb)!
13
14## Pick a language
15
16You might already have a language in mind that you want to use.
17Technically speaking, mal can be implemented in any sufficiently
c10dcb94 18complete programming language (i.e. Turing complete), however, there are a few
0f4ca9d1
JM
19language features that can make the task MUCH easier. Here are some of
20them in rough order of importance:
21
22* A sequential compound data structure (e.g. arrays, lists,
23 vectors, etc)
24* An associative compound data structure (e.g. a dictionary,
25 hash-map, associative array, etc)
26* Function references (first class functions, function pointers,
27 etc)
28* Real exception handling (try/catch, raise, throw, etc)
29* Variable argument functions (variadic, var args, splats, apply, etc)
30* Function closures
31* PCRE regular expressions
32
33In addition, the following will make your task especially easy:
34
bf518367
JM
35* Dynamic typing / boxed types (specifically, the ability to store
36 different data types in the sequential and associative structures
37 and the language keeps track of the type for you)
0f4ca9d1
JM
38* Compound data types support arbitrary runtime "hidden" data
39 (metadata, metatables, dynamic fields attributes)
40
41Here are some examples of languages that have all of the above
42features: JavaScript, Ruby, Python, Lua, R, Clojure.
43
44Many of the most popular languages already have Mal implementations.
45However, this should not discourage you from creating your own
46implementation in a language that already has one. However, if you go
47this route, I suggest you avoid referring to the existing
48implementations (i.e. "cheating") to maximize your learning experience
49instead of just borrowing mine. On the other hand, if your goal is to
50add new implementations to mal as efficiently as possible, then you
51SHOULD find the most similar target language implementation and refer
52to it frequently.
53
bf518367
JM
54If you want a fairly long list of programming languages with an
55approximate measure of popularity, try the [Programming Language
56Popularity Chart](http://langpop.corger.nl/)
57
0f4ca9d1
JM
58
59## Getting started
60
61* Install your chosen language interpreter/compiler, language package
62 manager and build tools (if applicable)
63
64* Fork the mal repository on github and then clone your forked
65 repository:
66```
67git clone git@github.com:YOUR_NAME/mal.git
68cd mal
69```
70
fc03712f
EC
71* Make a new directory for your implementation. For example, if your
72language is called "quux":
0f4ca9d1
JM
73```
74mkdir quux
75```
76
77* Modify the top level Makefile to allow the tests to be run against
78 your implementation. For example, if your language is named "quux"
79 and uses "qx" as the file extension, then make the following
80 3 modifications to Makefile:
81```
82IMPLS = ... quux ...
83...
84quux_STEP_TO_PROG = mylang/$($(1)).qx
85...
86quux_RUNSTEP = ../$(2) $(3)
87```
88
89This allows you to run tests against your implementation like this:
90```
91make test^quux^stepX
92```
93
94
95## General hints
96
97Stackoverflow and Google are your best friends. Modern polyglot
98developers do not memorize dozens of programming languages. Instead,
99they learn the peculiar terminology used with each language and then
100use this to search for their answers.
101
102Here are some other resources where multiple languages are
103compared/described:
104* http://learnxinyminutes.com/
105* http://hyperpolyglot.org/
106* http://rosettacode.org/
334a71b6 107* http://rigaux.org/language-study/syntax-across-languages/
0f4ca9d1
JM
108
109Do not let yourself be bogged down by specific problems. While the
110make-a-lisp process is structured as a series of steps, the reality is
111that building a lisp interpreter is more like a branching tree. If you
112get stuck on tail call optimization, or hash-maps, move on to other
113things. You will often have a stroke of inspiration for a problem as
114you work through other functionality. I have tried to structure this
45a8b3ca
JM
115guide and the tests to make clear which things can be deferred until
116later.
117
118An aside on deferrable/optional bits: when you run the tests for
119a given step, the last tests are often marked with an "optional"
120header. This indicates that these are tests for functionality that is
121not critical to finish a basic mal implementation. Many of the steps
122in this process guide have a "Deferrable" section, however, it is not
123quite the same meaning. Those sections include the functionality that
124is marked as optional in the tests, but they also include
0f4ca9d1
JM
125functionality that becomes mandatory at a later step. In other words,
126this is a "make your own Lisp adventure".
127
128Use test driven development. Each step of the make-a-lisp process has
129a bunch of tests associated with it and there is an easy script to run
130all the tests for a specific step in the process. Pick a failing test,
131fix it, repeat until all the tests for that step pass.
132
133The `process` directory contains abbreviated pseudocode and
134architecture images for each step of the make-a-lisp process. Use
135a textual diff/comparison tool to compare the previous pseudocode step
136with the one you are working on. The architecture images have changes
137from the previous step highlighted in red.
138
daa1cf3f
JM
139If you get completely stuck and are feeling like giving up, then you
140should "cheat" by referring to the same step or functionality in
141a existing implementation language. You are here to learn, not to take
142a test, so do not feel bad about it. Okay, you should feel a little
143bit bad about it.
0f4ca9d1
JM
144
145
146## The Make-A-Lisp Process
147
148In the steps that follow the name of the target language is "quux" and
149the file extension for that language is "qx".
150
151
152<a name="step0"></a>
153
154### Step 0: The REPL
155
156![step0_repl architecture](step0_repl.png)
157
158This step is basically just creating a skeleton of your interpreter.
159
160* Create a `step0_repl.qx` file in `quux/`.
161
162* Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep`
163 (read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just
164 stubs that return their first parameter (a string if your target
165 language is a statically typed) and `rep` calls them in order
166 passing the return to the input of the next.
167
58ba5af4
JM
168* Add a main loop that repeatedly prints a prompt (needs to be
169 "user> " for later tests to pass), gets a line of input from the
170 user, calls `rep` with that line of input, and then prints out the
171 result from `rep`. It should also exit when you send it an EOF
172 (often Ctrl-D).
0f4ca9d1
JM
173
174* If you are using a compiled (ahead-of-time rather than just-in-time)
175 language, then create a Makefile (or appropriate project definition
176 file) in your directory.
177
daa1cf3f
JM
178It is time to run your first tests. This will check that your program
179does input and output in a way that can be captured by the test
180harness. Go to the top level and run the following:
181```
182make test^quux^step0
183```
0f4ca9d1
JM
184
185Add and then commit your new `step0_repl.qx` and `Makefile` to git.
186
187Congratulations! You have just completed the first step of the
188make-a-lisp process.
189
190
45a8b3ca 191#### Deferrable:
0f4ca9d1
JM
192
193* Add full line editing and command history support to your
194 interpreter REPL. Many languages have a library/module that provide
195 line editing support. Another option if your language supports it is
196 to use an FFI (foreign function interface) to load and call directly
197 into GNU readline, editline, or libnoise library. Add line
198 editing interface code to `readline.qx`
199
200
201<a name="step1"></a>
202
203### Step 1: Read and Print
204
205![step1_read_print architecture](step1_read_print.png)
206
207In this step, your interpreter will "read" the string from the user
208and parse it into an internal tree data structure (an abstract syntax
209tree) and then take that data structure and "print" it back to
210a string.
211
212In non-lisp languages, this step (called "lexing and parsing") can be
213one of the most complicated parts of the compiler/interpreter. In
214Lisp, the data structure that you want in memory is basically
215represented directly in the code that the programmer writes
216(homoiconicity).
217
218For example, if the string is "(+ 2 (* 3 4))" then the read function
219will process this into a tree structure that looks like this:
220```
221 List
222 / | \
223 / | \
224 / | \
225 Sym:+ Int:2 List
226 / | \
227 / | \
228 / | \
229 Sym:* Int:3 Int:4
230```
231
232Each left paren and its matching right paren (lisp "sexpr") becomes
233a node in the tree and everything else becomes a leaf in the tree.
234
235If you can find code for an implementation of a JSON encoder/decoder
236in your target language then you can probably just borrow and modify
237that and be 75% of the way done with this step.
238
239The rest of this section is going to assume that you are not starting
240from an existing JSON encoder/decoder, but that you do have access to
241a Perl compatible regular expressions (PCRE) module/library. You can
242certainly implement the reader using simple string operations, but it
243is more involved. The `make`, `ps` (postscript) and Haskell
244implementations have examples of a reader/parser without using regular
245expression support.
246
247* Copy `step0_repl.qx` to `step1_read_print.qx`.
248
249* Add a `reader.qx` file to hold functions related to the reader.
250
251* If the target language has objects types (OOP), then the next step
252 is to create a simple stateful Reader object in `reader.qx`. This
253 object will store the tokens and a position. The Reader object will
254 have two methods: `next` and `peek`. `next` returns the tokens at
255 the current position and increments the position. `peek` just
256 returns the token at the current position.
257
258* Add a function `read_str` in `reader.qx`. This function
259 will call `tokenizer` and then create a new Reader object instance
260 with the tokens. Then it will call `read_form` with the Reader
261 instance.
262
263* Add a function `tokenizer` in `reader.qx`. This function will take
2315fd53 264 a single string and return an array/list
0f4ca9d1
JM
265 of all the tokens (strings) in it. The following regular expression
266 (PCRE) will match all mal tokens.
267```
268[\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"|;.*|[^\s\[\]{}('"`,;)]*)
269```
270
271* Add the function `read_form` to `reader.qx`. This function
272 will peek at the first token in the Reader object and switch on the
273 first character of that token. If the character is a left paren then
274 `read_list` is called with the Reader object. Otherwise, `read_atom`
275 is called with the Reader Object. The return value from `read_form`
276 is a mal data type. If your target language is statically typed then
277 you will need some way for `read_form` to return a variant or
278 subclass type. For example, if your language is object oriented,
2315fd53 279 then you can define a top level MalType (in `types.qx`) that all
0f4ca9d1
JM
280 your mal data types inherit from. The MalList type (which also
281 inherits from MalType) will contains a list/array of other MalTypes.
282 If your language is dynamically typed then you can likely just
283 return a plain list/array of other mal types.
284
285* Add the function `read_list` to `reader.qx`. This function will
286 repeatedly call `read_form` with the Reader object until it
287 encounters a ')' token (if it reach EOF before reading a ')' then
288 that is an error). It accumulates the results into a List type. If
289 your language does not have a sequential data type that can hold mal
290 type values you may need to implement one (in `types.qx`). Note
291 that `read_list` repeatedly calls `read_form` rather than
292 `read_atom`. This mutually recursive defintion between `read_list`
293 and `read_form` is what allows lists to contain lists.
294
295* Add the function `read_atom` to `reader.qx`. This function will
296 look at the contents of the token and return the appropriate scalar
297 (simple/single) data type value. Initially, you can just implement
298 numbers (integers) and symbols . This will allow you to proceed
299 through the next couple of steps before you will need to implement
300 the other fundamental mal types: nil, true, false, and string. The
301 remaining mal types: keyword, vector, hash-map, and atom do not
302 need to be implemented until step 9 (but can be implemented at any
303 point between this step and that). BTW, symbols types are just an
304 object that contains a single string name value (some languages have
305 symbol types already).
306
307* Add a file `printer.qx`. This file will contain a single function
308 `pr_str` which does the opposite of `read_str`: take a mal data
309 structure and return a string representation of it. But `pr_str` is
310 much simpler and is basically just a switch statement on the type of
311 the input object:
312
313 * symbol: return the string name of the symbol
314 * number: return the number as a string
315 * list: iterate through each element of the list calling `pr_str` on
316 it, then join the results with a space separator, and surround the
317 final result with parens
318
319* Change the `READ` function in `step1_read_print.qx` to call
320 `reader.read_str` and the `PRINT` function to call `printer.pr_str`.
321 `EVAL` continues to simply return its input but the type is now
322 a mal data type.
323
324You now have enough hooked up to begin testing your code. You can
325manually try some simple inputs:
326 * `123` -> `123`
327 * ` 123 ` -> `123`
328 * `abc` -> `abc`
329 * ` abc ` -> `abc`
330 * `(123 456)` -> `(123 456)`
331 * `( 123 456 789 ) ` -> `(123 456 789)`
6a767b0d 332 * `( + 2 (* 3 4) ) ` -> `(+ 2 (* 3 4))`
0f4ca9d1
JM
333
334To verify that your code is doing more than just eliminating extra
335spaces (and not failing), you can instrument your `reader.qx` functions.
336
337Once you have gotten past those simple manual tests, it is time to run
338the full suite of step 1 tests. Go to the top level and run the
339following:
340```
341make test^quux^step1
342```
343
344Fix any test failures related to symbols, numbers and lists.
345
346Depending on the functionality of your target language, it is likely
347that you have now just completed one of the most difficult steps. It
348is down hill from here. The remaining steps will probably be easier
349and each step will give progressively more bang for the buck.
350
45a8b3ca 351#### Deferrable:
0f4ca9d1
JM
352
353
354* Add error checking to your reader functions to make sure parens
355 are properly matched. Catch and print these errors in your main
356 loop. If your language does not have try/catch style bubble up
357 exception handling, then you will need to add explicit error
358 handling to your code to catch and pass on errors without crashing.
359
360* Add support for the other basic data type to your reader and printer
361 functions: string, nil, true, and false. These become mandatory at
362 step 4. When a string is read, a slash followed by a doublequote is
943f21c4 363 translated into a plain doublequote character and a slash followed by
0f4ca9d1
JM
364 "n" is translated into a newline. To properly print a string (for
365 step 4 string functions), the `pr_str` function needs another
366 parameter called `print_readably`. When `print_readably` is true,
367 doublequotes and newlines are translated into their printed
368 representations (the reverse of the reader). The `PRINT` function in
369 the main program should call `pr_str` with print_readably set to
370 true.
371
372* Add support for the other mal types: keyword, vector, hash-map, and
373 atom. TODO/TBD
374 * keyword: just a string stored with unicode prefix (or char 127 if
375 no unicode support).
376 * vector: can be implemented with same underlying type as list if
58ba5af4 377 there is some mechanism for marking/distinguishing from a list.
0f4ca9d1
JM
378 * hash-map: only need to implement string keys (which enables
379 keyword keys since they are just special strings).
380
381* Add support for reader macros which are special forms that are
382 transformed into other forms during the read phase.
383
384* Add comment support to your reader. The tokenizer should ignore
385 tokens that start with ";". Your `read_str` function will need to
386 properly handle when the tokenizer returns no values. The simplest
387 way to do this is to return `nil` mal value. A cleaner option (that
388 does not print `nil` at the prompt is to throw a special exception
389 that causes the main loop to simply continue at the beginning of the
390 loop without calling `rep`.
391
392
393<a name="step2"></a>
394
395### Step 2: Eval
396
397![step2_eval architecture](step2_eval.png)
398
399In step 1 your mal interpreter was basically just a way to validate
400input and eliminate extraneous white space. In this step you will turn
401your interpreter into a simple number calculator by adding
402functionality to the evaluator (`EVAL`).
403
404Compare the pseudocode for step 1 and step 2 to get a basic idea of
405the changes that will be made during this step:
406```
407diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt
408```
409
410* Copy `step1_read_print.qx` to `step2_eval.qx`.
411
412* Define a simple initial REPL environment. This environment is an
413 associative structure that maps symbols (or symbol names) to
414 numeric functions. For example, in python this would look something
415 like this:
416```
417repl_env = {'+': lambda a,b: a+b,
418 '-': lambda a,b: a-b,
419 '*': lambda a,b: a*b,
420 '/': lambda a,b: int(a/b)}
421```
422
423* Modify the `rep` function to pass the REPL environment as the second
424 parameter for the `EVAL` call.
425
426* Create a new function `eval_ast` which takes `ast` (mal data type)
427 and an associative structure (the environment from above).
428 `eval_ast` switches on the type of `ast` as follows:
429
430 * symbol: lookup the symbol in the environment structure and return
431 the value or raise an error no value is found
432 * list: return a new list that is the result of calling `EVAL` on
433 each of the members of the list
434 * otherwise just return the original `ast` value
435
436* Modify `EVAL` to check if the first parameter `ast` is a list.
437 * `ast` is not a list: then return the result of calling `eval_ast`
438 on it.
439 * `ast` is a list: call `eval_ast` to get a new evaluated list. Take
440 the first item of the evaluated list and call it as function using
441 the rest of the evaluated list as its arguments.
442
443If your target language does not have full variable length argument
444support (e.g. variadic, vararg, splats, apply) then you will need to
445pass the full list of arguments as a single parameter and split apart
446the individual values inside of every mal function. This is annoying,
447but workable.
448
449The process of taking a list and invoking or executing it to return
450something new is known in Lisp as the "apply" phase.
451
452Try some simple expressions:
453
454 * `(+ 2 3)` -> `5`
455 * `(+ 2 (* 3 4))` -> `14`
456
457The most likely challenge you will encounter is how to properly call
458a function references using an arguments list.
459
460Now go to the top level, run the step 2 tests and fix the errors.
461```
462make test^quux^step2
463```
464
465You now have a simple prefix notation calculator!
466
467
468<a name="step3"></a>
469
470### Step 3: Environments
471
472![step3_env architecture](step3_env.png)
473
474In step 2 you were already introduced to REPL environment (`repl_env`)
475where the basic numeric functions were stored and looked up. In this
476step you will add the ability to create new environments (`let*`) and
daa1cf3f 477modify existing environments (`def!`).
0f4ca9d1
JM
478
479A Lisp environment is an associative data structure that maps symbols (the
480keys) to values. But Lisp environments have an additional important
481function: they can refer to another environment (the outer
482environment). During environment lookups, if the current environment
483does not have the symbol, the lookup continues in the outer
484environment, and continues this way until the symbol is either found,
485or the outer environment is `nil` (the outermost environment in the
486chain).
487
488Compare the pseudocode for step 2 and step 3 to get a basic idea of
489the changes that will be made during this step:
490```
491diff -urp ../process/step2_eval.txt ../process/step3_env.txt
492```
493
d9c020b0 494* Copy `step2_eval.qx` to `step3_env.qx`.
0f4ca9d1
JM
495
496* Create `env.qx` to hold the environment definition.
497
498* Define an `Env` object that is instantiated with a single `outer`
499 parameter and starts with an empty associative data structure
500 property `data`.
501
502* Define three methods for the Env object:
503 * set: takes a symbol key and a mal value and adds to the `data`
504 structure
505 * find: takes a symbol key and if the current environment contains
506 that key then return the environment. If no key is found and outer
507 is not `nil` then call find (recurse) on the outer environment.
508 * get: takes a symbol key and uses the `find` method to locate the
509 environment with the key, then returns the matching value. If no
510 key is found up the outer chain, then throws/raises a "not found"
511 error.
512
169ddeb2 513* Update `step3_env.qx` to use the new `Env` type to create the
0f4ca9d1
JM
514 repl_env (with a `nil` outer value) and use the `set` method to add
515 the numeric functions.
516
517* Modify `eval_ast` to call the `get` method on the `env` parameter.
518
519* Modify the apply section of `EVAL` to switch on the first element of
520 the list:
521 * symbol "def!": call the set method of the current environment
522 (second parameter of `EVAL` called `env`) using the unevaluated
523 first parameter (second list element) as the symbol key and the
524 evaluated second parameter as the value.
525 * symbol "let*": create a new environment using the current
526 environment as the outer value and then use the first parameter as
0a63c9f9 527 a list of new bindings in the "let*" environment. Take the second
0f4ca9d1
JM
528 element of the binding list, call `EVAL` using the new "let*"
529 environment as the evaluation environment, then call `set` on the
0a63c9f9 530 "let*" environment using the first binding list element as the key
0f4ca9d1
JM
531 and the evaluated second element as the value. This is repeated
532 for each odd/even pair in the binding list. Note in particular,
533 the bindings earlier in the list can be referred to by later
534 bindings. Finally, the second parameter (third element) of the
535 original `let*` form is evaluated using the new "let*" environment
536 and the result is returned as the result of the `let*` (the new
537 let environment is discarded upon completion).
538 * otherwise: call `eval_ast` on the list and apply the first element
539 to the rest as before.
540
541`def!` and `let*` are Lisp "specials" (or "special atoms") which means
542that they are language level features and more specifically that the
543rest of the list elements (arguments) may be evaluated differently (or
544not at all) unlike the default apply case where all elements of the
545list are evaluated before the first element is invoked. Lists which
546contain a "special" as the first element are known as "special forms".
547The are special because the follow special evaluation rules.
548
549Try some simple environment tests:
550
551 * `(def! a 6)` -> `6`
552 * `a` -> `6`
553 * `(def! b (+ a 2))` -> `8`
554 * `(+ a b)` -> `14`
555 * `(let* (c 2) c)` -> `2`
556
557Now go to the top level, run the step 3 tests and fix the errors.
558```
559make test^quux^step3
560```
561
562You mal implementation is still basically just a numeric calculator
563with save/restore capability. But you have set the foundation for step
5644 where it will begin to feel like a real programming language.
565
566
567An aside on mutation and typing:
568
569The "!" suffix on symbols is used to indicate that this symbol refers
570to a function that mutates something else. In this case, the `def!`
571symbol indicates a special form that will mutate the current
572environment. Many (maybe even most) of runtime problems that are
573encountered in software engineering are a result of mutation. By
574clearly marking code where mutation may occur, you can more easily
575track down the likely cause of runtime problems when they do occur.
576
577Another cause of runtime errors is type errors, where a value of one
578type is unexpectedly treated by the program as a different and
579incompatible type. Statically typed languages try to make the
580programmer solve all type problems before the program is allowed to
581run. Most Lisp variants tend to be dynamically typed (types of values
582are checked when they are actually used at runtime).
583
584As an aside-aside: The great debate between static and dynamic typing
585debate can be understood by following the money. Advocates of strict
586static typing use words like "correctness" and "safety" and thus get
587government and academic funding. Advocates of dynamic typing use words
588like "agile" and "time-to-market" and thus get venture capital and
589commercial funding.
590
591
592<a name="step4"></a>
593
594### Step 4: If Fn Do
595
596![step4_if_fn_do architecture](step4_if_fn_do.png)
597
598In step 3 you added environments and the special forms for
599manipulating environments. In this step you will add 3 new special
600forms (`if`, `fn*` and `do`) and add several more core functions to
601the default REPL environment. Our new architecture will look like
602this:
603
604The `fn*` special form is how new user-defined functions are created.
605In some Lisps, this special form is named "lambda".
606
607Compare the pseudocode for step 3 and step 4 to get a basic idea of
608the changes that will be made during this step:
609```
610diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt
611```
612
613* Copy `step3_env.qx` to `step4_if_fn_do.qx`.
614
615* If you have not implemented reader and printer support (and data
616 types) for `nil`, `true` and `false`, you will need to do so for
617 this step.
618
619* Update the constructor/initializer for environments to take two new
620 arguments: `binds` and `exprs`. Bind (`set`) each element (symbol)
621 of the binds list to the respective element of the `exprs` list.
622
623* Add support to `printer.qx` to print functions values. A string
624 literal like "#<function>" is sufficient.
625
626* Add the following special forms to `EVAL`.
627
628 * `do`: Evaluate the all the elements of the list and return the
629 final element (evaluated).
630 * `if`: Evaluate the first parameter (second element). If the result
631 (condition) is anything other than `nil` or `false`, then evaluate
632 the second parammeter (third element of the list) and return the
633 result. Otherwise, evaluate the third parameter (fourth element)
634 and return the result. If condition is false and there is no third
635 parameter, then just return `nil`.
636 * `fn*`: Return a new function closure. The body of that closure
637 does the following:
638 * Create a new environment using `env` (closed over from outer
639 scope) as the `outer` parameter, the first parameter (second
640 list element of `ast` from the outer scope) as the `binds`
641 parameter, and the parameters to the closure as the `exprs`
642 parameter.
643 * Call `EVAL` on the second parameter (third list element of `ast`
644 from outer scope), using the new environment. Use the result as
645 the return value of the closure.
646
647If your target language does not support closures, then you will need
648to implement `fn*` using some sort of structure or object that stores
649the values being closed over: the first and second elements of the
650`ast` list (function parameter list and function body) and the current
651environment `env`. In this case, your native functions will need to be
652wrapped in the same way. You will probably also need a method/function
653that invokes your function object/structure for the default case of
654the apply section of `EVAL`.
655
656Try out the basic functionality you have implemented:
657
658 * `(fn* [a] a)` -> `#<function>`
659 * `( (fn* [a] a) 7)` -> `7`
660 * `( (fn* [a] (+ a 1)) 10)` -> `11`
661 * `( (fn* [a b] (+ a b)) 2 3)` -> `5`
662
663* Add a new file `core.qx` and define an associative data structure
664 `ns` (namespace) that maps symbols to functions. Move the numeric
665 function definitions into this structure.
666
667* Modify `step4_if_fn_do.qx` to iterate through the `core.ns`
668 structure and add (`set`) each symbol/function mapping to the
669 REPL environment (`repl_env`).
670
671* Add the following functions to `core.ns`:
672 * `list`: take the parameters and return them as a list.
673 * `list?`: return true if the first parameter is a list, false
674 otherwise.
675 * `empty?`: treat the first parameter as a list and return true if
676 the list is empty and false if it contains any elements.
677 * `count`: treat the first parameter as a list and return the number
678 of elements that it contains.
679 * `=`: compare the first two parameters and return true if they are
680 the same type and contain the same value. In the case of equal
681 length lists, each element of the list should be compared for
682 equality and if they are the same return true, otherwise false.
683 * `<`, `<=`, `>`, and `>=`: treat the first two parameters as
684 numbers and do the corresponding numeric comparison, returning
685 either true or false.
686
687Now go to the top level, run the step 4 tests. There are a lot of
688tests in step 4 but all of the non-optional tests that do not involve
689strings should be able to pass now.
690
691```
692make test^quux^step4
693```
694
695Your mal implementation is already beginning to look like a real
696language. You have flow control, conditionals, user-defined functions
697with lexical scope, side-effects (if you implement the string
1380d8c1
JM
698functions), etc. However, our little interpreter has not quite reached
699Lisp-ness yet. The next several steps will take your implementation
700from a neat toy to a full featured language.
0f4ca9d1 701
45a8b3ca 702#### Deferrable:
0f4ca9d1
JM
703
704* Implement Clojure-style variadic function parameters. Modify the
705 constructor/initializer for environments, so that if a "&" symbol is
706 encountered in the `binds` list, the next symbol in the `binds` list
707 after the "&" is bound to the rest of the `exprs` list that has not
708 been bound yet.
709
710* Defines a `not` function using mal itself. In `step4_if_fn_do.qx`
711 call the `rep` function with this string:
712 "(def! not (fn* (a) (if a false true)))".
713
714* Implement the strings functions in `core.qx`. To implement these
715 functions, you will need to implement the string support in the
45a8b3ca 716 reader and printer (deferrable section of step 1). Each of the string
0f4ca9d1
JM
717 functions takes multiple mal values, prints them (`pr_str`) and
718 joins them together into a new string.
719 * `pr-str`: calls `pr_str` on each argument with `print_readably`
720 set to true, joins the results with " " and returns the new
721 string.
722 * `str`: calls `pr_str` on each argument with `print_readably` set
723 to false, concatenates the results together ("" separator), and
724 returns the new string.
725 * `prn`: calls `pr_str` on each argument with `print_readably` set
726 to true, joins the results with " ", prints the string to the
727 screen and then returns `nil`.
728 * `println`: calls `pr_str` on each argument with `print_readably` set
729 to false, joins the results with " ", prints the string to the
730 screen and then returns `nil`.
731
732
733<a name="step5"></a>
734
735### Step 5: Tail call optimization
736
737![step5_tco architecture](step5_tco.png)
738
739In step 4 you added special forms `do`, `if` and `fn*` and you defined
740some core functions. In this step you will add a Lisp feature called
741tail call optimization (TCO). Also called "tail recursion" or
742sometimes just "tail calls".
743
744Several of the special forms that you have defined in `EVAL` end up
745calling back into `EVAL`. For those forms that call `EVAL` as the last
746thing that they do before returning (tail call) you will just loop back
747to the beginning of eval rather than calling it again. The advantage
748of this approach is that it avoids adding more frames to the call
749stack. This is especially important in Lisp languages because they do
750not tend to have iteration control structures preferring recursion
751instead. However, with tail call optimization, recursion can be made
752as stack efficient as iteration.
753
754Compare the pseudocode for step 4 and step 5 to get a basic idea of
755the changes that will be made during this step:
756```
757diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt
758```
759
760* Copy `step4_env.qx` to `step5_tco.qx`.
761
762* Add a loop (e.g. while true) around all code in `EVAL`.
763
764* Modify each of the following form cases to add tail call recursion
765 support:
766 * `let*`: remove the final `EVAL` call on the second `ast` argument
767 (third list element). Set `env` (i.e. the local variable passed in
768 as second parameter of `EVAL`) to the new let environment. Set
769 `ast` (i.e. the local variable passed in as first parameter of
770 `EVAL`) to be the second `ast` argument. Continue at the beginning
771 of the loop (no return).
772 * `do`: change the `eval_ast` call to evaluate all the parameters
773 the except for the last (2nd list element up to but not
774 including last). Set `ast` to the last element of `ast`. Continue
775 at the beginning of the loop (`env` stays unchanged).
776 * `if`: the condition continues to be evaluated, however, rather
777 than evaluating the true or false branch, `ast` is set to the
778 unevaluated value of the chosen branch. Continue at the beginning
779 of the loop (`env` is unchanged).
780
781* The return value from the `fn*` special form will now become an
782 object/structure with attributes that allow the default invoke case
783 of `EVAL` to do TCO on mal functions. Those attributes are:
45a8b3ca
JM
784 * `fn`: the original function value return in step 4 (this is
785 actually deferrable until step 9 when it is needed for the `map`
786 and `apply` core functions).
0f4ca9d1
JM
787 * `ast`: the second `ast` argument (third list element) representing
788 the body of the function.
789 * `params`: the first `ast` argument (second list element)
790 representing the parameter names of the function.
791 * `env`: the current value of the `env` parameter of `EVAL`.
792
793* The default "apply"/invoke case of `EVAL` must now be changed to
794 account for the new object/structure returned by the `fn*` form.
795 Continue to call `eval_ast` on `ast`. The first element is `f`.
796 Switch on the type of `f`:
797 * regular function (not one defined by `fn*`): apply/invoke it as
798 * before (in step 4).
799 * a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate
800 a new environment using the `env` and `params` attributes of `f`
801 as the `outer` and `binds` arguments and rest `ast` arguments
802 (list elements 2 through the end) as the `exprs` argument. Set
803 `env` to the new environment. Continue at the beginning of the loop.
804
805Run some manual tests from previous steps to make sure you have not
806broken anything by adding TCO.
807
808Now go to the top level, run the step 5 tests.
809
810```
811make test^quux^step5
812```
813
814Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to`
815function cannot be tail call optimized because it does something after
816the recursive call (`sum-to` calls itself and then does the addition).
817Lispers say that the `sum-to` is not in tail position. The `sum2`
818function however, calls itself from tail position. In other words, the
819recursive call to `sum2` is the last action that `sum2` does. Calling
820`sum-to` with a large value will cause a stack overflow exception in
821most target languages (some have super-special tricks they use to
822avoid stack overflows).
823
824Congratulations, your mal implementation already has a feature (TCO)
825that most mainstream languages lack.
826
827
828<a name="step6"></a>
829
830### Step 6: Files and Evil
831
832![step6_file architecture](step6_file.png)
833
834In step 5 you added tail call optimization. In this step you will add
835some string and file operations and give your implementation a touch
836of evil ... er, eval. And as long as your language supports function
837closures, this step will be quite simple. However, to complete this
838step, you must implement string type support, so if you have been
839holding off on that you will need to go back and do so.
840
841Compare the pseudocode for step 5 and step 6 to get a basic idea of
842the changes that will be made during this step:
843```
844diff -urp ../process/step5_tco.txt ../process/step6_file.txt
845```
846
ffd31966
JM
847* Copy `step5_tco.qx` to `step6_file.qx`.
848
0f4ca9d1
JM
849* Add two new string functions to the core namespaces:
850 * `read-string`: this function just exposes the `read_str` function
851 from the reader. If your mal string type is not the same as your
852 target language (e.g. statically typed language) then your
853 `read-string` function will need to unbox (extract) the raw string
854 from the mal string type in order to call `read_str`.
855 * `slurp`: this function takes a file name (string) and returns the
856 contents of the file as a string. Once again, if your mal string
857 type wraps a raw target language string, then you will need to
858 unmarshall (extract) the string parameter to get the raw file name
859 string and marshall (wrap) the result back to a mal string type.
860
861* In your main program, add a new `eval` (symbol) entry to your REPL
862 environment. The value of the new entry is a regular function
863 closure with a single argument `ast`. The closure calls the real
864 `EVAL` function using the `ast` as the first argument and the REPL
865 environment (closed over from outside) as the second argument. The
866 result of the `EVAL` call is returned.
867
868* Define a `load-file` function using mal itself. In your main
869 program call the `rep` function with this string:
870 "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \")\")))))".
871
872Try out `load-file`:
873 * `(load-file "../tests/incA.mal")` -> `9`
874 * `(inc4 3)` -> `7`
875
876The `load-file` function does the following:
877 * Call `slurp` to read in a file by name. Surround the contents with
878 "(do ...)" so that the whole file will be treated as a single
879 program AST (abstract syntax tree).
880 * Call `read-string` on the string returned from `slurp`. This uses
881 the reader to read/convert the file contents into mal data/AST.
882 * Call `eval` (the one in the REPL environment) on the AST returned
883 from `read-string` to "run" it.
884
885Now go to the top level, run the step 6 tests. The optional tests will
886need support from the reader for comments, vectors and hash-maps:
887```
888make test^quux^step6
889```
890
891Congratulations, you now have a full-fledged scripting language that
892can run other mal programs. However, the set of functions that are
893available (from `core.qx`) is fairly limited. The bulk of the
894functions you will add are described in step 9, but you will begin to
895flesh them out over the next few steps to support quoting (step 7) and
896macros (step 8).
897
898
45a8b3ca 899#### Deferrable:
0f4ca9d1
JM
900
901* Add the ability to run another mal program from the command line.
902 Prior to the REPL loop, check if your mal implementation is called
903 with command line arguments. If so, treat the first argument as
904 a filename and use `rep` to call `load-file` on that filename, and
905 finally exit/terminate execution.
906
907* Add the rest of the command line arguments to your REPL environment
908 so that programs that are run with `load-file` have access to their
909 calling environmnet. Add a new "*ARGV*" (symbol) entry to your REPL
910 environment. The value of this entry should be the rest of the
911 command line arguments as a mal list value.
912
913
914<a name="step7"></a>
915
916### Step 7: Quoting
917
918![step7_quote architecture](step7_quote.png)
919
920In step 7 you will add the special forms `quote` and `quasiquote` and
921add supporting core functions `cons` and `concat`. The two quote forms
922add a powerful abstraction for manipulating mal code itself
923(meta-programming).
924
0f4ca9d1
JM
925The `quote` special form indicates to the evaluator (`EVAL`) that the
926parameter should not be evaluated (yet). At first glance, this might
927not seem particular useful but an example of what this enables is the
928ability for a mal program to refer to a symbol itself rather than the
929value that it evaluates to. Likewise with lists. For example, consider
930the following:
931
932* `(prn abc)`: this will lookup the symbol `abc` in the current
933 evaluation environment and print it. This will result in error if
934 `abc` is not defined.
935* `(prn (quote abc))`: this will print "abc" (prints the symbol
936 itself). This will work regardless of whether `abc` is defined in
937 the current environment.
938* `(prn (1 2 3))`: this will result in an error because `1` is not
939 a function and cannot be applied to the arguments `(2 3)`.
940* `(prn (quote (1 2 3)))`: this will print "(1 2 3)".
941* `(def! l (quote (1 2 3)))`: list quoting allows us to define lists
942 directly in the code (list literal). Another way of doing this is
943 with the list function: `(def! l (list 1 2 3))`.
944
945The second special quoting form is `quasiquote`. This allows a quoted
946list to have internal elements of the list that are temporarily
947unquoted (normal evaluation). There are two special forms that only
948mean something within a quasiquoted list: `unquote` and
949`splice-unquote`. These are perhaps best explained with some examples:
950
951* `(def! lst (quote (2 3)))` -> `(2 3)`
952* `(quasiquote (1 (unquote lst)))` -> `(1 (2 3))`
953* `(quasiquote (1 (splice-unquote lst)))` -> `(1 2 3)`
954
955The `unquote` form turns evaluation back on for its argument and the
956result of evaluation is put in place into the quasiquoted list. The
957`splice-unquote` also turns evaluation back on for its argument, but
958the evaluated value must be a list which is then "spliced" into the
959quasiquoted list. The true power of the quasiquote form will be
960manifest when it used together with macros (in the next step).
961
ffd31966
JM
962Compare the pseudocode for step 6 and step 7 to get a basic idea of
963the changes that will be made during this step:
964```
965diff -urp ../process/step6_file.txt ../process/step7_quote.txt
966```
967
968* Copy `step6_file.qx` to `step7_quote.qx`.
969
0f4ca9d1
JM
970* Before implementing the quoting forms, you will need to implement
971* some supporting functions in the core namespace:
972 * `cons`: this function takes a list as its second
973 parameter and returns a new list that has the first argument
974 prepended to it.
975 * `concat`: this functions takes 0 or more lists as
976 parameters and returns a new list that is a concatenation of all
977 the list parameters.
978
979An aside on immutability: note that neither cons or concat mutate
980their original list arguments. Any references to them (i.e. other
981lists that they may be "contained" in) will still refer to the
982original unchanged value. Mal, like Clojure, is a language which uses
983immutable data structures. I encourage you to read about the power and
984importance of immutability as implemented in Clojure (from which
985Mal borrows most of its syntax and feature-set).
986
987* Add the `quote` special form. This form just returns its argument
988 (the second list element of `ast`).
989
990* Add the `quasiquote` special form. First implement a helper function
991 `is_pair` that returns true if the parameter is a non-empty list.
992 Then define a `quasiquote` function. This is called from `EVAL` with
993 the first `ast` argument (second list element) and then `ast` is set
994 to the result and execution continues at the top of the loop (TCO).
995 The `quasiquote` function takes a parameter `ast` and has the
996 following conditional:
997 1. if `is_pair` of `ast` is false: return a new list containing:
998 a symbol named "quote" and `ast`.
999 2. else if the first element of `ast` is a symbol named "unquote":
1000 return the second element of `ast`.
1001 3. if `is_pair` of first element of `ast` is true and the first
1002 element of first element of `ast` (`ast[0][0]`) is a symbol named
1003 "splice-unquote": return a new list containing: a symbol named
1004 "concat", the second element of first element of `ast`
1005 (`ast[0][1]`), and the result of calling `quasiquote` with the
1006 second through last element of `ast`.
1007 4. otherwise: return a new list containing: a symbol named "cons", the
1008 result of calling `quasiquote` on first element of `ast`
1009 (`ast[0]`), and result of calling `quasiquote` with the second
1010 through last element of `ast`.
1011
1012
1013Now go to the top level, run the step 7 tests:
1014```
1015make test^quux^step7
1016```
1017
1018Quoting is one of the more mundane functions available in mal, but do
1019not let that discourage you. Your mal implementation is almost
1020complete, and quoting sets the stage for the next very exiting step:
1021macros.
1022
1023
45a8b3ca 1024#### Deferrable
0f4ca9d1
JM
1025
1026* The full names for the quoting forms are fairly verbose. Most Lisp
1027 languages have a short-hand syntax and Mal is no exception. These
1028 short-hand syntaxes are known as reader macros because they allow us
1029 to manipulate mal code during the reader phase. Macros that run
1030 during the eval phase are just called "macros" and are described in
1031 the next section. Expand the conditional with reader `read_form`
1032 function to add the following four cases:
1033 * token is "'" (single quote): return a new list that contains the
1034 symbol "quote" and the result of reading the next form
1035 (`read_form`).
1036 * token is "`" (back-tick): return a new list that contains the
1037 symbol "quasiquote" and the result of reading the next form
1038 (`read_form`).
1039 * token is "~" (tilde): return a new list that contains the
1040 symbol "unquote" and the result of reading the next form
1041 (`read_form`).
1042 * token is "~@" (tilde + at sign): return a new list that contains
1043 the symbol "splice-unquote" and the result of reading the next
1044 form (`read_form`).
1045
1046* Add support for quoting of vectors. The `is_pair` function should
1047 return true if the argument is a non-empty list or vector. `cons`
1048 should also accept a vector as the second argument. The return value
1049 is a list regardless. `concat` should support concatenation of
1050 lists, vectors, or a mix or both. The result is always a list.
1051
1052
1053<a name="step8"></a>
1054
1055### Step 8: Macros
1056
1057![step8_macros architecture](step8_macros.png)
1058
1059Your mal implementation is now ready for one of the most Lispy and
1060exciting of all programming concepts: macros. In the previous step,
1061quoting enabled some simple manipulation data structures and therefore
1062manipulation of mal code (because the `eval` function from step
10636 turns mal data into code). In this step you will be able to mark mal
1064functions as macros which can manipulate mal code before it is
1065evaluated. In other words, macros are user-defined special forms. Or
1066to look at it another way, macros allow mal programs to redefine
1067the mal language itself.
1068
1069Compare the pseudocode for step 7 and step 8 to get a basic idea of
1070the changes that will be made during this step:
1071```
1072diff -urp ../process/step7_quote.txt ../process/step8_macros.txt
1073```
1074
ffd31966
JM
1075* Copy `step7_quote.qx` to `step8_macros.qx`.
1076
1077
0f4ca9d1
JM
1078You might think that the infinite power of macros would require some
1079sort of complex mechanism, but the implementation is actually fairly
1080simple.
1081
1082* Add a new attribute `is_macro` to mal function types. This should
1083 default to false.
1084
1085* Add a new special form `defmacro!`. This is very similar to the
1086 `def!` form, but before the evaluated value (mal function) is set in
1087 the environment, the `is_macro` attribute should be set to true.
1088
1089* Add a `is_macro_call` function: This function takes arguments `ast`
1090 and `env`. It returns true if `ast` is a list that contains a symbol
1091 as the first element and that symbol refers to a function in the
1092 `env` environment and that function has the `is_macro` attribute set
1093 to true. Otherwise, it returns false.
1094
1095* Add a `macroexpand` function: This function takes arguments `ast`
1096 and `env`. It calls `is_macro_call` with `ast` and `env` and loops
1097 while that condition is true. Inside the loop, the first element of
1098 the `ast` list (a symbol), is looked up in the environment to get
1099 the macro function. This macro function is then called/applied with
1100 the rest of the `ast` elements (2nd through the last) as arguments.
1101 The return value of the macro call becomes the new value of `ast`.
1102 When the loop completes because `ast` no longer represents a macro
1103 call, the current value of `ast` is returned.
1104
1105* In the evaluator (`EVAL`) before the special forms switch (apply
1106 section), perform macro expansion by calling the `macroexpand`
1107 function with the current value of `ast` and `env`. Set `ast` to the
1108 result of that call. If the new value of `ast` is no longer a list
1109 after macro expansion, then return `ast`, otherwise continue with
1110 the rest of the apply section (special forms switch).
1111
1112* Add a new special form condition for `macroexpand`. Call the
1113 `macroexpand` function using the first `ast` argument (second list
1114 element) and `env`. Return the result. This special form allows
1115 a mal program to do explicit macro expansion without applying the
1116 result (which can be useful for debugging macro expansion).
1117
1118Now go to the top level, run the step 8 tests:
1119```
1120make test^quux^step8
1121```
1122
8a98ef9a
JM
1123There is a reasonably good chance that the macro tests will not pass
1124the first time. Although the implementation of macros is fairly
1125simple, debugging runtime bugs with macros can be fairly tricky. If
1126you do run into subtle problems that are difficult to solve, let me
1127recommend a couple of approaches:
1128
1129* Use the macroexpand special form to eliminate one of the layers of
1130 indirection (to expand but skip evaluate). This will often reveal
1131 the source of the issue.
1132* Add a debug print statement to the top of your main `eval` function
1133 (inside the TCO loop) to print the current value of `ast` (hint use
1134 `pr_str` to get easier to debug output). Pull up the step8
1135 implementation from another language and uncomment its `eval`
1136 function (yes, I give you permission to violate the rule this once).
1137 Run the two side-by-side. The first difference is likely to point to
1138 the bug.
1139
1140Congratulations! You now have a Lisp interpreter with a super power
1141that most non-Lisp languages can only dream of (I have it on good
1142authority that languages dream when you are not using them). If you
1143are not already familiar with Lisp macros, I suggest the following
1144excercise: write a recursive macro that handles postfixed mal code
1145(with the function as the last parameter instead of the first). Or
1146not. I have not actually done so myself, but I have heard it is an
1147interesting excercise.
1148
1149In the next step you will add try/catch style exception handling to
1150your implementation in addition to some new core functions. After
1151step9 you will be very close to having a fully self-hosting mal
1152implementation. Let us continue!
1153
0f4ca9d1 1154
45a8b3ca 1155### Deferrable
0f4ca9d1
JM
1156
1157* Add the following new core functions which are frequently used in
1158 macro functions:
1159 * `nth`: this function takes a list (or vector) and a number (index)
1160 as arguments, returns the element of the list at the given index.
1161 If the index is out of range, this function raises an exception.
1162 * `first`: this function takes a list (or vector) as its argument
1163 and return the first element. If the list (or vector) is empty or
1164 is `nil` then `nil` is returned.
1165 * `rest`: this function takes a list (or vector) as its argument and
1166 returns a new list containing all the elements except the first.
1167
1168* In the main program, use the `rep` function to define two new
1169 control structures macros. Here are the string arguments for `rep`
1170 to define these macros:
1171 * `cond`: "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))"
1172 * `or`: "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) `(let* (or_FIXME ~(first xs)) (if or_FIXME or_FIXME (or ~@(rest xs))))))))"
1173
1174
ffd31966
JM
1175<a name="step9"></a>
1176
1177### Step 9: Try
1178
1179![step9_try architecture](step9_try.png)
1180
1181Compare the pseudocode for step 8 and step 9 to get a basic idea of
dbac60df 1182the changes that will be made during this step:
ffd31966
JM
1183```
1184diff -urp ../process/step8_macros.txt ../process/step9_try.txt
1185```
1186
1187* Copy `step8_macros.qx` to `step9_try.qx`.
1188
45a8b3ca 1189* TODO/TBD.
dbac60df 1190 * In step 5, if you did not add the original function (`fn`) to the
45a8b3ca
JM
1191 returned structure returned from `fn*`, the you will need to do so
1192 now.
ffd31966
JM
1193
1194
8569b2af 1195<a name="stepA"></a>
ffd31966
JM
1196
1197### Step A: Interop and Self-hosting
1198
90f618cb 1199![stepA_mal architecture](stepA_mal.png)
ffd31966
JM
1200
1201Compare the pseudocode for step 9 and step A to get a basic idea of
1202the changes that will be made during this step:
1203```
dbac60df 1204diff -urp ../process/step9_try.txt ../process/stepA_mal.txt
ffd31966
JM
1205```
1206
90f618cb 1207* Copy `step9_try.qx` to `stepA_mal.qx`.
ffd31966
JM
1208
1209* TODO/TBD
8569b2af 1210 * Self-hosted tests
ffd31966
JM
1211
1212
0f4ca9d1
JM
1213## TODO:
1214
1215* simplify: "X argument (list element Y)" -> ast[Y]
1216* step 8 summary (power of macros, warning about macros, almost to
1217 self-hosting)
1218* step 9
1219* step A
1220* more info on hash-map and keyword implementation. Hash-maps just
1221 need to support string keys.
1222* list of types with metadata: list, vector, hash-map, mal functions
1223* more clarity about when to peek and poke in read_list and read_form
1224* tokenizer: use first group rather than whole match (to eliminate
1225 whitespace/commas)