gensym: hide the counter in an environment, define inc in stepA.
[jackhill/mal.git] / process / guide.md
1 # The Make-A-Lisp Process
2
3 So you want to write a Lisp interpreter? Welcome!
4
5 The goal of the Make-A-Lisp project is to make it easy to write your
6 own Lisp interpreter without sacrificing those many "Aha!" moments
7 that come from ascending the McCarthy mountain. When you reach the peak
8 of this particular mountain, you will have an interpreter for the mal
9 Lisp language that is powerful enough to be self-hosting, meaning it
10 will be able to run a mal interpreter written in mal itself.
11
12 So jump right in (er ... start the climb)!
13
14 - [Pick a language](#pick-a-language)
15 - [Getting started](#getting-started)
16 - [General hints](#general-hints)
17 - [The Make-A-Lisp Process](#the-make-a-lisp-process-1)
18 - [Step 0: The REPL](#step-0-the-repl)
19 - [Step 1: Read and Print](#step-1-read-and-print)
20 - [Step 2: Eval](#step-2-eval)
21 - [Step 3: Environments](#step-3-environments)
22 - [Step 4: If Fn Do](#step-4-if-fn-do)
23 - [Step 5: Tail call optimization](#step-5-tail-call-optimization)
24 - [Step 6: Files, Mutation, and Evil](#step-6-files-mutation-and-evil)
25 - [Step 7: Quoting](#step-7-quoting)
26 - [Step 8: Macros](#step-8-macros)
27 - [Step 9: Try](#step-9-try)
28 - [Step A: Metadata, Self-hosting and Interop](#step-a-metadata-self-hosting-and-interop)
29
30
31 ## Pick a language
32
33 You might already have a language in mind that you want to use.
34 Technically speaking, mal can be implemented in any sufficiently
35 complete programming language (i.e. Turing complete), however, there
36 are a few language features that can make the task MUCH easier. Here
37 are some of them in rough order of importance:
38
39 * A sequential compound data structure (e.g. arrays, lists,
40 vectors, etc)
41 * An associative compound data structure (e.g. a dictionary,
42 hash-map, associative array, etc)
43 * Function references (first class functions, function pointers,
44 etc)
45 * Real exception handling (try/catch, raise, throw, etc)
46 * Variable argument functions (variadic, var args, splats, apply, etc)
47 * Function closures
48 * PCRE regular expressions
49
50 In addition, the following will make your task especially easy:
51
52 * Dynamic typing / boxed types (specifically, the ability to store
53 different data types in the sequential and associative structures
54 and the language keeps track of the type for you)
55 * Compound data types support arbitrary runtime "hidden" data
56 (metadata, metatables, dynamic fields attributes)
57
58 Here are some examples of languages that have all of the above
59 features: JavaScript, Ruby, Python, Lua, R, Clojure.
60
61 Michael Fogus has some great blog posts on interesting but less well
62 known languages and many of the languages on his lists do not yet have
63 any mal implementations:
64 * http://blog.fogus.me/2011/08/14/perlis-languages/
65 * http://blog.fogus.me/2011/10/18/programming-language-development-the-past-5-years/
66
67 Many of the most popular languages already have Mal implementations.
68 However, this should not discourage you from creating your own
69 implementation in a language that already has one. However, if you go
70 this route, I suggest you avoid referring to the existing
71 implementations (i.e. "cheating") to maximize your learning experience
72 instead of just borrowing mine. On the other hand, if your goal is to
73 add new implementations to mal as efficiently as possible, then you
74 SHOULD find the most similar target language implementation and refer
75 to it frequently.
76
77 If you want a list of programming languages with an
78 approximate measure of popularity try the [RedMonk Programming
79 Language
80 Rankings](https://redmonk.com/sogrady/2019/03/20/language-rankings-1-19/)
81 or the [GitHut 2.0 Project](https://madnight.github.io/githut).
82
83
84 ## Getting started
85
86 * Install your chosen language interpreter/compiler, language package
87 manager and build tools (if applicable)
88
89 * Fork the mal repository on github and then clone your forked
90 repository:
91 ```
92 git clone git@github.com:YOUR_NAME/mal.git
93 cd mal
94 ```
95
96 * Make a new directory for your implementation. For example, if your
97 language is called "quux":
98 ```
99 mkdir quux
100 ```
101
102 * Modify the top level Makefile to allow the tests to be run against
103 your implementation. For example, if your language is named "quux"
104 and uses "qx" as the file extension, then make the following
105 3 modifications to Makefile:
106 ```
107 IMPLS = ... quux ...
108 ...
109 quux_STEP_TO_PROG = mylang/$($(1)).qx
110 ```
111
112 * Add a "run" script to you implementation directory that listens to
113 the "STEP" environment variable for the implementation step to run
114 and defaults to "stepA_mal". Make sure the run script has the
115 executable file permission set (or else the test runner might fail with a
116 permission denied error message). The following are examples of "run"
117 scripts for a compiled language and an interpreted language (where
118 the interpreter is named "quux"):
119
120 ```
121 #!/bin/bash
122 exec $(dirname $0)/${STEP:-stepA_mal} "${@}"
123 ```
124
125 ```
126 #!/bin/bash
127 exec quux $(dirname $0)/${STEP:-stepA_mal}.qx "${@}"
128 ```
129
130 This allows you to run tests against your implementation like this:
131 ```
132 make "test^quux^stepX"
133 ```
134
135 If your implementation language is a compiled language, then you
136 should also add a Makefile at the top level of your implementation
137 directory. This Makefile will define how to build the files pointed to
138 by the quux_STEP_TO_PROG macro. The top-level Makefile will attempt to
139 build those targets before running tests. If it is a scripting
140 language/uncompiled, then no Makefile is necessary because
141 quux_STEP_TO_PROG will point to a source file that already exists and
142 does not need to be compiled/built.
143
144
145 ## General hints
146
147 Stackoverflow and Google are your best friends. Modern polyglot
148 developers do not memorize dozens of programming languages. Instead,
149 they learn the peculiar terminology used with each language and then
150 use this to search for their answers.
151
152 Here are some other resources where multiple languages are
153 compared/described:
154 * http://learnxinyminutes.com/
155 * http://hyperpolyglot.org/
156 * http://rosettacode.org/
157 * http://rigaux.org/language-study/syntax-across-languages/
158
159 Do not let yourself be bogged down by specific problems. While the
160 make-a-lisp process is structured as a series of steps, the reality is
161 that building a lisp interpreter is more like a branching tree. If you
162 get stuck on tail call optimization, or hash-maps, move on to other
163 things. You will often have a stroke of inspiration for a problem as
164 you work through other functionality. I have tried to structure this
165 guide and the tests to make clear which things can be deferred until
166 later.
167
168 An aside on deferrable/optional bits: when you run the tests for
169 a given step, the last tests are often marked with an "optional"
170 header. This indicates that these are tests for functionality that is
171 not critical to finish a basic mal implementation. Many of the steps
172 in this process guide have a "Deferrable" section, however, it is not
173 quite the same meaning. Those sections include the functionality that
174 is marked as optional in the tests, but they also include
175 functionality that becomes mandatory at a later step. In other words,
176 this is a "make your own Lisp adventure".
177
178 Use test driven development. Each step of the make-a-lisp process has
179 a bunch of tests associated with it and there is an easy script to run
180 all the tests for a specific step in the process. Pick a failing test,
181 fix it, repeat until all the tests for that step pass.
182
183 ## Reference Code
184
185 The `process` directory contains abbreviated pseudocode and
186 architecture diagrams for each step of the make-a-lisp process. Use
187 a textual diff/comparison tool to compare the previous pseudocode step
188 with the one you are working on. The architecture diagram images have
189 changes from the previous step highlighted in red. There is also
190 a concise
191 [cheatsheet](http://kanaka.github.io/mal/cheatsheet.html) that
192 summarizes the key changes at each step.
193
194 If you get completely stuck and are feeling like giving up, then you
195 should "cheat" by referring to the same step or functionality in
196 a existing implementation language. You are here to learn, not to take
197 a test, so do not feel bad about it. Okay, you should feel a little
198 bit bad about it.
199
200
201 ## The Make-A-Lisp Process
202
203 In the steps that follow the name of the target language is "quux" and
204 the file extension for that language is "qx".
205
206
207 <a name="step0"></a>
208
209 ### Step 0: The REPL
210
211 ![step0_repl architecture](step0_repl.png)
212
213 This step is basically just creating a skeleton of your interpreter.
214
215 * Create a `step0_repl.qx` file in `quux/`.
216
217 * Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep`
218 (read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just
219 stubs that return their first parameter (a string if your target
220 language is a statically typed) and `rep` calls them in order
221 passing the return to the input of the next.
222
223 * Add a main loop that repeatedly prints a prompt (needs to be
224 "user> " for later tests to pass), gets a line of input from the
225 user, calls `rep` with that line of input, and then prints out the
226 result from `rep`. It should also exit when you send it an EOF
227 (often Ctrl-D).
228
229 * If you are using a compiled (ahead-of-time rather than just-in-time)
230 language, then create a Makefile (or appropriate project definition
231 file) in your directory.
232
233 It is time to run your first tests. This will check that your program
234 does input and output in a way that can be captured by the test
235 harness. Go to the top level and run the following:
236 ```
237 make "test^quux^step0"
238 ```
239
240 Add and then commit your new `step0_repl.qx` and `Makefile` to git.
241
242 Congratulations! You have just completed the first step of the
243 make-a-lisp process.
244
245
246 #### Optional:
247
248 * Add full line editing and command history support to your
249 interpreter REPL. Many languages have a library/module that provide
250 line editing support. Another option if your language supports it is
251 to use an FFI (foreign function interface) to load and call directly
252 into GNU readline, editline, or linenoise library. Add line
253 editing interface code to `readline.qx`
254
255
256 <a name="step1"></a>
257
258 ### Step 1: Read and Print
259
260 ![step1_read_print architecture](step1_read_print.png)
261
262 In this step, your interpreter will "read" the string from the user
263 and parse it into an internal tree data structure (an abstract syntax
264 tree) and then take that data structure and "print" it back to
265 a string.
266
267 In non-lisp languages, this step (called "lexing and parsing") can be
268 one of the most complicated parts of the compiler/interpreter. In
269 Lisp, the data structure that you want in memory is basically
270 represented directly in the code that the programmer writes
271 (homoiconicity).
272
273 For example, if the string is "(+ 2 (* 3 4))" then the read function
274 will process this into a tree structure that looks like this:
275 ```
276 List
277 / | \
278 / | \
279 / | \
280 Sym:+ Int:2 List
281 / | \
282 / | \
283 / | \
284 Sym:* Int:3 Int:4
285 ```
286
287 Each left paren and its matching right paren (lisp "sexpr") becomes
288 a node in the tree and everything else becomes a leaf in the tree.
289
290 If you can find code for an implementation of a JSON encoder/decoder
291 in your target language then you can probably just borrow and modify
292 that and be 75% of the way done with this step.
293
294 The rest of this section is going to assume that you are not starting
295 from an existing JSON encoder/decoder, but that you do have access to
296 a Perl compatible regular expressions (PCRE) module/library. You can
297 certainly implement the reader using simple string operations, but it
298 is more involved. The `make`, `ps` (postscript) and Haskell
299 implementations have examples of a reader/parser without using regular
300 expression support.
301
302 * Copy `step0_repl.qx` to `step1_read_print.qx`.
303
304 * Add a `reader.qx` file to hold functions related to the reader.
305
306 * If the target language has objects types (OOP), then the next step
307 is to create a simple stateful Reader object in `reader.qx`. This
308 object will store the tokens and a position. The Reader object will
309 have two methods: `next` and `peek`. `next` returns the token at
310 the current position and increments the position. `peek` just
311 returns the token at the current position.
312
313 * Add a function `read_str` in `reader.qx`. This function
314 will call `tokenize` and then create a new Reader object instance
315 with the tokens. Then it will call `read_form` with the Reader
316 instance.
317
318 * Add a function `tokenize` in `reader.qx`. This function will take
319 a single string and return an array/list
320 of all the tokens (strings) in it. The following regular expression
321 (PCRE) will match all mal tokens.
322 ```
323 [\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"?|;.*|[^\s\[\]{}('"`,;)]*)
324 ```
325 * For each match captured within the parenthesis starting at char 6 of the
326 regular expression a new token will be created.
327
328 * `[\s,]*`: Matches any number of whitespaces or commas. This is not captured
329 so it will be ignored and not tokenized.
330
331 * `~@`: Captures the special two-characters `~@` (tokenized).
332
333 * ```[\[\]{}()'`~^@]```: Captures any special single character, one of
334 ```[]{}()'`~^@``` (tokenized).
335
336 * `"(?:\\.|[^\\"])*"?`: Starts capturing at a double-quote and stops at the
337 next double-quote unless it was preceded by a backslash in which case it
338 includes it until the next double-quote (tokenized). It will also
339 match unbalanced strings (no ending double-quote) which should be
340 reported as an error.
341
342 * `;.*`: Captures any sequence of characters starting with `;` (tokenized).
343
344 * ```[^\s\[\]{}('"`,;)]*```: Captures a sequence of zero or more non special
345 characters (e.g. symbols, numbers, "true", "false", and "nil") and is sort
346 of the inverse of the one above that captures special characters (tokenized).
347
348 * Add the function `read_form` to `reader.qx`. This function
349 will peek at the first token in the Reader object and switch on the
350 first character of that token. If the character is a left paren then
351 `read_list` is called with the Reader object. Otherwise, `read_atom`
352 is called with the Reader Object. The return value from `read_form`
353 is a mal data type. If your target language is statically typed then
354 you will need some way for `read_form` to return a variant or
355 subclass type. For example, if your language is object oriented,
356 then you can define a top level MalType (in `types.qx`) that all
357 your mal data types inherit from. The MalList type (which also
358 inherits from MalType) will contain a list/array of other MalTypes.
359 If your language is dynamically typed then you can likely just
360 return a plain list/array of other mal types.
361
362 * Add the function `read_list` to `reader.qx`. This function will
363 repeatedly call `read_form` with the Reader object until it
364 encounters a ')' token (if it reach EOF before reading a ')' then
365 that is an error). It accumulates the results into a List type. If
366 your language does not have a sequential data type that can hold mal
367 type values you may need to implement one (in `types.qx`). Note
368 that `read_list` repeatedly calls `read_form` rather than
369 `read_atom`. This mutually recursive definition between `read_list`
370 and `read_form` is what allows lists to contain lists.
371
372 * Add the function `read_atom` to `reader.qx`. This function will
373 look at the contents of the token and return the appropriate scalar
374 (simple/single) data type value. Initially, you can just implement
375 numbers (integers) and symbols. This will allow you to proceed
376 through the next couple of steps before you will need to implement
377 the other fundamental mal types: nil, true, false, and string. The
378 remaining scalar mal type, keyword does not
379 need to be implemented until step A (but can be implemented at any
380 point between this step and that). BTW, symbols types are just an
381 object that contains a single string name value (some languages have
382 symbol types already).
383
384 * Add a file `printer.qx`. This file will contain a single function
385 `pr_str` which does the opposite of `read_str`: take a mal data
386 structure and return a string representation of it. But `pr_str` is
387 much simpler and is basically just a switch statement on the type of
388 the input object:
389
390 * symbol: return the string name of the symbol
391 * number: return the number as a string
392 * list: iterate through each element of the list calling `pr_str` on
393 it, then join the results with a space separator, and surround the
394 final result with parens
395
396 * Change the `READ` function in `step1_read_print.qx` to call
397 `reader.read_str` and the `PRINT` function to call `printer.pr_str`.
398 `EVAL` continues to simply return its input but the type is now
399 a mal data type.
400
401 You now have enough hooked up to begin testing your code. You can
402 manually try some simple inputs:
403 * `123` -> `123`
404 * ` 123 ` -> `123`
405 * `abc` -> `abc`
406 * ` abc ` -> `abc`
407 * `(123 456)` -> `(123 456)`
408 * `( 123 456 789 ) ` -> `(123 456 789)`
409 * `( + 2 (* 3 4) ) ` -> `(+ 2 (* 3 4))`
410
411 To verify that your code is doing more than just eliminating extra
412 spaces (and not failing), you can instrument your `reader.qx` functions.
413
414 Once you have gotten past those simple manual tests, it is time to run
415 the full suite of step 1 tests. Go to the top level and run the
416 following:
417 ```
418 make "test^quux^step1"
419 ```
420
421 Fix any test failures related to symbols, numbers and lists.
422
423 Depending on the functionality of your target language, it is likely
424 that you have now just completed one of the most difficult steps. It
425 is down hill from here. The remaining steps will probably be easier
426 and each step will give progressively more bang for the buck.
427
428 #### Deferrable:
429
430
431 * Add support for the other basic data type to your reader and printer
432 functions: string, nil, true, and false. Nil, true, and false
433 become mandatory at step 4, strings at step 6. When a string is read,
434 the following transformations are
435 applied: a backslash followed by a doublequote is translated into
436 a plain doublequote character, a backslash followed by "n" is
437 translated into a newline, and a backslash followed by another
438 backslash is translated into a single backslash. To properly print
439 a string (for step 4 string functions), the `pr_str` function needs
440 another parameter called `print_readably`. When `print_readably` is
441 true, doublequotes, newlines, and backslashes are translated into
442 their printed representations (the reverse of the reader). The
443 `PRINT` function in the main program should call `pr_str` with
444 print_readably set to true.
445
446 * Add error checking to your reader functions to make sure parens
447 are properly matched. Catch and print these errors in your main
448 loop. If your language does not have try/catch style bubble up
449 exception handling, then you will need to add explicit error
450 handling to your code to catch and pass on errors without crashing.
451
452 * Add support for reader macros which are forms that are
453 transformed into other forms during the read phase. Refer to
454 `tests/step1_read_print.mal` for the form that these macros should
455 take (they are just simple transformations of the token stream).
456
457 * Add support for the other mal types: keyword, vector, hash-map.
458 * keyword: a keyword is a token that begins with a colon. A keyword
459 can just be stored as a string with special unicode prefix like
460 0x29E (or char 0xff/127 if the target language does not have good
461 unicode support) and the printer translates strings with that
462 prefix back to the keyword representation. This makes it easy to
463 use keywords as hash map keys in most languages. You can also
464 store keywords as a unique data type, but you will need to make
465 sure they can be used as hash map keys (which may involve doing
466 a similar prefixed translation anyways).
467 * vector: a vector can be implemented with same underlying
468 type as a list as long as there is some mechanism to keep track of
469 the difference. You can use the same reader function for both
470 lists and vectors by adding parameters for the starting and ending
471 tokens.
472 * hash-map: a hash-map is an associative data structure that maps
473 strings to other mal values. If you implement keywords as prefixed
474 strings, then you only need a native associative data structure
475 which supports string keys. Clojure allows any value to be a hash
476 map key, but the base functionality in mal is to support strings
477 and keyword keys. Because of the representation of hash-maps as
478 an alternating sequence of keys and values, you can probably use
479 the same reader function for hash-maps as lists and vectors with
480 parameters to indicate the starting and ending tokens. The odd
481 tokens are then used for keys with the corresponding even tokens
482 as the values.
483
484 * Add comment support to your reader. The tokenizer should ignore
485 tokens that start with ";". Your `read_str` function will need to
486 properly handle when the tokenizer returns no values. The simplest
487 way to do this is to return `nil` mal value. A cleaner option (that
488 does not print `nil` at the prompt is to throw a special exception
489 that causes the main loop to simply continue at the beginning of the
490 loop without calling `rep`.
491
492
493 <a name="step2"></a>
494
495 ### Step 2: Eval
496
497 ![step2_eval architecture](step2_eval.png)
498
499 In step 1 your mal interpreter was basically just a way to validate
500 input and eliminate extraneous white space. In this step you will turn
501 your interpreter into a simple number calculator by adding
502 functionality to the evaluator (`EVAL`).
503
504 Compare the pseudocode for step 1 and step 2 to get a basic idea of
505 the changes that will be made during this step:
506 ```
507 diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt
508 ```
509
510 * Copy `step1_read_print.qx` to `step2_eval.qx`.
511
512 * Define a simple initial REPL environment. This environment is an
513 associative structure that maps symbols (or symbol names) to
514 numeric functions. For example, in python this would look something
515 like this:
516 ```
517 repl_env = {'+': lambda a,b: a+b,
518 '-': lambda a,b: a-b,
519 '*': lambda a,b: a*b,
520 '/': lambda a,b: int(a/b)}
521 ```
522
523 * Modify the `rep` function to pass the REPL environment as the second
524 parameter for the `EVAL` call.
525
526 * Create a new function `eval_ast` which takes `ast` (mal data type)
527 and an associative structure (the environment from above).
528 `eval_ast` switches on the type of `ast` as follows:
529
530 * symbol: lookup the symbol in the environment structure and return
531 the value or raise an error if no value is found
532 * list: return a new list that is the result of calling `EVAL` on
533 each of the members of the list
534 * otherwise just return the original `ast` value
535
536 * Modify `EVAL` to check if the first parameter `ast` is a list.
537 * `ast` is not a list: then return the result of calling `eval_ast`
538 on it.
539 * `ast` is a empty list: return ast unchanged.
540 * `ast` is a list: call `eval_ast` to get a new evaluated list. Take
541 the first item of the evaluated list and call it as function using
542 the rest of the evaluated list as its arguments.
543
544 If your target language does not have full variable length argument
545 support (e.g. variadic, vararg, splats, apply) then you will need to
546 pass the full list of arguments as a single parameter and split apart
547 the individual values inside of every mal function. This is annoying,
548 but workable.
549
550 The process of taking a list and invoking or executing it to return
551 something new is known in Lisp as the "apply" phase.
552
553 Try some simple expressions:
554
555 * `(+ 2 3)` -> `5`
556 * `(+ 2 (* 3 4))` -> `14`
557
558 The most likely challenge you will encounter is how to properly call
559 a function references using an arguments list.
560
561 Now go to the top level, run the step 2 tests and fix the errors.
562 ```
563 make "test^quux^step2"
564 ```
565
566 You now have a simple prefix notation calculator!
567
568 #### Deferrable:
569
570 * `eval_ast` should evaluate elements of vectors and hash-maps. Add the
571 following cases in `eval_ast`:
572 * If `ast` is a vector: return a new vector that is the result of calling
573 `EVAL` on each of the members of the vector.
574 * If `ast` is a hash-map: return a new hash-map which consists of key-value
575 pairs where the key is a key from the hash-map and the value is the result
576 of calling `EVAL` on the corresponding value.
577
578
579 <a name="step3"></a>
580
581 ### Step 3: Environments
582
583 ![step3_env architecture](step3_env.png)
584
585 In step 2 you were already introduced to REPL environment (`repl_env`)
586 where the basic numeric functions were stored and looked up. In this
587 step you will add the ability to create new environments (`let*`) and
588 modify existing environments (`def!`).
589
590 A Lisp environment is an associative data structure that maps symbols (the
591 keys) to values. But Lisp environments have an additional important
592 function: they can refer to another environment (the outer
593 environment). During environment lookups, if the current environment
594 does not have the symbol, the lookup continues in the outer
595 environment, and continues this way until the symbol is either found,
596 or the outer environment is `nil` (the outermost environment in the
597 chain).
598
599 Compare the pseudocode for step 2 and step 3 to get a basic idea of
600 the changes that will be made during this step:
601 ```
602 diff -urp ../process/step2_eval.txt ../process/step3_env.txt
603 ```
604
605 * Copy `step2_eval.qx` to `step3_env.qx`.
606
607 * Create `env.qx` to hold the environment definition.
608
609 * Define an `Env` object that is instantiated with a single `outer`
610 parameter and starts with an empty associative data structure
611 property `data`.
612
613 * Define three methods for the Env object:
614 * set: takes a symbol key and a mal value and adds to the `data`
615 structure
616 * find: takes a symbol key and if the current environment contains
617 that key then return the environment. If no key is found and outer
618 is not `nil` then call find (recurse) on the outer environment.
619 * get: takes a symbol key and uses the `find` method to locate the
620 environment with the key, then returns the matching value. If no
621 key is found up the outer chain, then throws/raises a "not found"
622 error.
623
624 * Update `step3_env.qx` to use the new `Env` type to create the
625 repl_env (with a `nil` outer value) and use the `set` method to add
626 the numeric functions.
627
628 * Modify `eval_ast` to call the `get` method on the `env` parameter.
629
630 * Modify the apply section of `EVAL` to switch on the first element of
631 the list:
632 * symbol "def!": call the set method of the current environment
633 (second parameter of `EVAL` called `env`) using the unevaluated
634 first parameter (second list element) as the symbol key and the
635 evaluated second parameter as the value.
636 * symbol "let\*": create a new environment using the current
637 environment as the outer value and then use the first parameter as
638 a list of new bindings in the "let\*" environment. Take the second
639 element of the binding list, call `EVAL` using the new "let\*"
640 environment as the evaluation environment, then call `set` on the
641 "let\*" environment using the first binding list element as the key
642 and the evaluated second element as the value. This is repeated
643 for each odd/even pair in the binding list. Note in particular,
644 the bindings earlier in the list can be referred to by later
645 bindings. Finally, the second parameter (third element) of the
646 original `let*` form is evaluated using the new "let\*" environment
647 and the result is returned as the result of the `let*` (the new
648 let environment is discarded upon completion).
649 * otherwise: call `eval_ast` on the list and apply the first element
650 to the rest as before.
651
652 `def!` and `let*` are Lisp "specials" (or "special atoms") which means
653 that they are language level features and more specifically that the
654 rest of the list elements (arguments) may be evaluated differently (or
655 not at all) unlike the default apply case where all elements of the
656 list are evaluated before the first element is invoked. Lists which
657 contain a "special" as the first element are known as "special forms".
658 They are special because they follow special evaluation rules.
659
660 Try some simple environment tests:
661
662 * `(def! a 6)` -> `6`
663 * `a` -> `6`
664 * `(def! b (+ a 2))` -> `8`
665 * `(+ a b)` -> `14`
666 * `(let* (c 2) c)` -> `2`
667
668 Now go to the top level, run the step 3 tests and fix the errors.
669 ```
670 make "test^quux^step3"
671 ```
672
673 You mal implementation is still basically just a numeric calculator
674 with save/restore capability. But you have set the foundation for step
675 4 where it will begin to feel like a real programming language.
676
677
678 An aside on mutation and typing:
679
680 The "!" suffix on symbols is used to indicate that this symbol refers
681 to a function that mutates something else. In this case, the `def!`
682 symbol indicates a special form that will mutate the current
683 environment. Many (maybe even most) of runtime problems that are
684 encountered in software engineering are a result of mutation. By
685 clearly marking code where mutation may occur, you can more easily
686 track down the likely cause of runtime problems when they do occur.
687
688 Another cause of runtime errors is type errors, where a value of one
689 type is unexpectedly treated by the program as a different and
690 incompatible type. Statically typed languages try to make the
691 programmer solve all type problems before the program is allowed to
692 run. Most Lisp variants tend to be dynamically typed (types of values
693 are checked when they are actually used at runtime).
694
695 As an aside-aside: The great debate between static and dynamic typing
696 can be understood by following the money. Advocates of strict static
697 typing use words like "correctness" and "safety" and thus get
698 government and academic funding. Advocates of dynamic typing use words
699 like "agile" and "time-to-market" and thus get venture capital and
700 commercial funding.
701
702
703 <a name="step4"></a>
704
705 ### Step 4: If Fn Do
706
707 ![step4_if_fn_do architecture](step4_if_fn_do.png)
708
709 In step 3 you added environments and the special forms for
710 manipulating environments. In this step you will add 3 new special
711 forms (`if`, `fn*` and `do`) and add several more core functions to
712 the default REPL environment. Our new architecture will look like
713 this:
714
715 The `fn*` special form is how new user-defined functions are created.
716 In some Lisps, this special form is named "lambda".
717
718 Compare the pseudocode for step 3 and step 4 to get a basic idea of
719 the changes that will be made during this step:
720 ```
721 diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt
722 ```
723
724 * Copy `step3_env.qx` to `step4_if_fn_do.qx`.
725
726 * If you have not implemented reader and printer support (and data
727 types) for `nil`, `true` and `false`, you will need to do so for
728 this step.
729
730 * Update the constructor/initializer for environments to take two new
731 arguments: `binds` and `exprs`. Bind (`set`) each element (symbol)
732 of the binds list to the respective element of the `exprs` list.
733
734 * Add support to `printer.qx` to print functions values. A string
735 literal like "#\<function>" is sufficient.
736
737 * Add the following special forms to `EVAL`:
738
739 * `do`: Evaluate all the elements of the list using `eval_ast`
740 and return the final evaluated element.
741 * `if`: Evaluate the first parameter (second element). If the result
742 (condition) is anything other than `nil` or `false`, then evaluate
743 the second parameter (third element of the list) and return the
744 result. Otherwise, evaluate the third parameter (fourth element)
745 and return the result. If condition is false and there is no third
746 parameter, then just return `nil`.
747 * `fn*`: Return a new function closure. The body of that closure
748 does the following:
749 * Create a new environment using `env` (closed over from outer
750 scope) as the `outer` parameter, the first parameter (second
751 list element of `ast` from the outer scope) as the `binds`
752 parameter, and the parameters to the closure as the `exprs`
753 parameter.
754 * Call `EVAL` on the second parameter (third list element of `ast`
755 from outer scope), using the new environment. Use the result as
756 the return value of the closure.
757
758 If your target language does not support closures, then you will need
759 to implement `fn*` using some sort of structure or object that stores
760 the values being closed over: the first and second elements of the
761 `ast` list (function parameter list and function body) and the current
762 environment `env`. In this case, your native functions will need to be
763 wrapped in the same way. You will probably also need a method/function
764 that invokes your function object/structure for the default case of
765 the apply section of `EVAL`.
766
767 Try out the basic functionality you have implemented:
768
769 * `(fn* (a) a)` -> `#<function>`
770 * `( (fn* (a) a) 7)` -> `7`
771 * `( (fn* (a) (+ a 1)) 10)` -> `11`
772 * `( (fn* (a b) (+ a b)) 2 3)` -> `5`
773
774 * Add a new file `core.qx` and define an associative data structure
775 `ns` (namespace) that maps symbols to functions. Move the numeric
776 function definitions into this structure.
777
778 * Modify `step4_if_fn_do.qx` to iterate through the `core.ns`
779 structure and add (`set`) each symbol/function mapping to the
780 REPL environment (`repl_env`).
781
782 * Add the following functions to `core.ns`:
783 * `prn`: call `pr_str` on the first parameter with `print_readably`
784 set to true, prints the result to the screen and then return
785 `nil`. Note that the full version of `prn` is a deferrable below.
786 * `list`: take the parameters and return them as a list.
787 * `list?`: return true if the first parameter is a list, false
788 otherwise.
789 * `empty?`: treat the first parameter as a list and return true if
790 the list is empty and false if it contains any elements.
791 * `count`: treat the first parameter as a list and return the number
792 of elements that it contains.
793 * `=`: compare the first two parameters and return true if they are
794 the same type and contain the same value. In the case of equal
795 length lists, each element of the list should be compared for
796 equality and if they are the same return true, otherwise false.
797 * `<`, `<=`, `>`, and `>=`: treat the first two parameters as
798 numbers and do the corresponding numeric comparison, returning
799 either true or false.
800
801 Now go to the top level, run the step 4 tests. There are a lot of
802 tests in step 4 but all of the non-optional tests that do not involve
803 strings should be able to pass now.
804
805 ```
806 make "test^quux^step4"
807 ```
808
809 Your mal implementation is already beginning to look like a real
810 language. You have flow control, conditionals, user-defined functions
811 with lexical scope, side-effects (if you implement the string
812 functions), etc. However, our little interpreter has not quite reached
813 Lisp-ness yet. The next several steps will take your implementation
814 from a neat toy to a full featured language.
815
816 #### Deferrable:
817
818 * Implement Clojure-style variadic function parameters. Modify the
819 constructor/initializer for environments, so that if a "&" symbol is
820 encountered in the `binds` list, the next symbol in the `binds` list
821 after the "&" is bound to the rest of the `exprs` list that has not
822 been bound yet.
823
824 * Define a `not` function using mal itself. In `step4_if_fn_do.qx`
825 call the `rep` function with this string:
826 "(def! not (fn* (a) (if a false true)))".
827
828 * Implement the strings functions in `core.qx`. To implement these
829 functions, you will need to implement the string support in the
830 reader and printer (deferrable section of step 1). Each of the string
831 functions takes multiple mal values, prints them (`pr_str`) and
832 joins them together into a new string.
833 * `pr-str`: calls `pr_str` on each argument with `print_readably`
834 set to true, joins the results with " " and returns the new
835 string.
836 * `str`: calls `pr_str` on each argument with `print_readably` set
837 to false, concatenates the results together ("" separator), and
838 returns the new string.
839 * `prn`: calls `pr_str` on each argument with `print_readably` set
840 to true, joins the results with " ", prints the string to the
841 screen and then returns `nil`.
842 * `println`: calls `pr_str` on each argument with `print_readably` set
843 to false, joins the results with " ", prints the string to the
844 screen and then returns `nil`.
845
846
847 <a name="step5"></a>
848
849 ### Step 5: Tail call optimization
850
851 ![step5_tco architecture](step5_tco.png)
852
853 In step 4 you added special forms `do`, `if` and `fn*` and you defined
854 some core functions. In this step you will add a Lisp feature called
855 tail call optimization (TCO). Also called "tail recursion" or
856 sometimes just "tail calls".
857
858 Several of the special forms that you have defined in `EVAL` end up
859 calling back into `EVAL`. For those forms that call `EVAL` as the last
860 thing that they do before returning (tail call) you will just loop back
861 to the beginning of eval rather than calling it again. The advantage
862 of this approach is that it avoids adding more frames to the call
863 stack. This is especially important in Lisp languages because they tend
864 to prefer using recursion instead of iteration for control structures.
865 (Though some Lisps, such as Common Lisp, have iteration.) However, with
866 tail call optimization, recursion can be made as stack efficient as
867 iteration.
868
869 Compare the pseudocode for step 4 and step 5 to get a basic idea of
870 the changes that will be made during this step:
871 ```
872 diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt
873 ```
874
875 * Copy `step4_if_fn_do.qx` to `step5_tco.qx`.
876
877 * Add a loop (e.g. while true) around all code in `EVAL`.
878
879 * Modify each of the following form cases to add tail call recursion
880 support:
881 * `let*`: remove the final `EVAL` call on the second `ast` argument
882 (third list element). Set `env` (i.e. the local variable passed in
883 as second parameter of `EVAL`) to the new let environment. Set
884 `ast` (i.e. the local variable passed in as first parameter of
885 `EVAL`) to be the second `ast` argument. Continue at the beginning
886 of the loop (no return).
887 * `do`: change the `eval_ast` call to evaluate all the parameters
888 except for the last (2nd list element up to but not including
889 last). Set `ast` to the last element of `ast`. Continue
890 at the beginning of the loop (`env` stays unchanged).
891 * `if`: the condition continues to be evaluated, however, rather
892 than evaluating the true or false branch, `ast` is set to the
893 unevaluated value of the chosen branch. Continue at the beginning
894 of the loop (`env` is unchanged).
895
896 * The return value from the `fn*` special form will now become an
897 object/structure with attributes that allow the default invoke case
898 of `EVAL` to do TCO on mal functions. Those attributes are:
899 * `ast`: the second `ast` argument (third list element) representing
900 the body of the function.
901 * `params`: the first `ast` argument (second list element)
902 representing the parameter names of the function.
903 * `env`: the current value of the `env` parameter of `EVAL`.
904 * `fn`: the original function value (i.e. what was return by `fn*`
905 in step 4). Note that this is deferrable until step 9 when it is
906 required for the `map` and `apply` core functions). You will also
907 need it in step 6 if you choose to not to defer atoms/`swap!` from
908 that step.
909
910 * The default "apply"/invoke case of `EVAL` must now be changed to
911 account for the new object/structure returned by the `fn*` form.
912 Continue to call `eval_ast` on `ast`. The first element is `f`.
913 Switch on the type of `f`:
914 * regular function (not one defined by `fn*`): apply/invoke it as
915 before (in step 4).
916 * a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate
917 a new environment using the `env` and `params` attributes of `f`
918 as the `outer` and `binds` arguments and rest `ast` arguments
919 (list elements 2 through the end) as the `exprs` argument. Set
920 `env` to the new environment. Continue at the beginning of the loop.
921
922 Run some manual tests from previous steps to make sure you have not
923 broken anything by adding TCO.
924
925 Now go to the top level, run the step 5 tests.
926
927 ```
928 make "test^quux^step5"
929 ```
930
931 Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to`
932 function cannot be tail call optimized because it does something after
933 the recursive call (`sum-to` calls itself and then does the addition).
934 Lispers say that the `sum-to` is not in tail position. The `sum2`
935 function however, calls itself from tail position. In other words, the
936 recursive call to `sum2` is the last action that `sum2` does. Calling
937 `sum-to` with a large value will cause a stack overflow exception in
938 most target languages (some have super-special tricks they use to
939 avoid stack overflows).
940
941 Congratulations, your mal implementation already has a feature (TCO)
942 that most mainstream languages lack.
943
944
945 <a name="step6"></a>
946
947 ### Step 6: Files, Mutation, and Evil
948
949 ![step6_file architecture](step6_file.png)
950
951 In step 5 you added tail call optimization. In this step you will add
952 some string and file operations and give your implementation a touch
953 of evil ... er, eval. And as long as your language supports function
954 closures, this step will be quite simple. However, to complete this
955 step, you must implement string type support, so if you have been
956 holding off on that you will need to go back and do so.
957
958 Compare the pseudocode for step 5 and step 6 to get a basic idea of
959 the changes that will be made during this step:
960 ```
961 diff -urp ../process/step5_tco.txt ../process/step6_file.txt
962 ```
963
964 * Copy `step5_tco.qx` to `step6_file.qx`.
965
966 * Add two new string functions to the core namespaces:
967 * `read-string`: this function just exposes the `read_str` function
968 from the reader. If your mal string type is not the same as your
969 target language (e.g. statically typed language) then your
970 `read-string` function will need to unbox (extract) the raw string
971 from the mal string type in order to call `read_str`.
972 * `slurp`: this function takes a file name (string) and returns the
973 contents of the file as a string. Once again, if your mal string
974 type wraps a raw target language string, then you will need to
975 unmarshall (extract) the string parameter to get the raw file name
976 string and marshall (wrap) the result back to a mal string type.
977
978 * In your main program, add a new symbol "eval" to your REPL
979 environment. The value of this new entry is a function that takes
980 a single argument `ast`. The closure calls the your `EVAL` function
981 using the `ast` as the first argument and the REPL environment
982 (closed over from outside) as the second argument. The result of
983 the `EVAL` call is returned. This simple but powerful addition
984 allows your program to treat mal data as a mal program. For example,
985 you can now to this:
986 ```
987 (def! mal-prog (list + 1 2))
988 (eval mal-prog)
989 ```
990
991 * Define a `load-file` function using mal itself. In your main
992 program call the `rep` function with this string:
993 "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \")\")))))".
994
995 Try out `load-file`:
996 * `(load-file "../tests/incA.mal")` -> `9`
997 * `(inc4 3)` -> `7`
998
999 The `load-file` function does the following:
1000 * Call `slurp` to read in a file by name. Surround the contents with
1001 "(do ...)" so that the whole file will be treated as a single
1002 program AST (abstract syntax tree).
1003 * Call `read-string` on the string returned from `slurp`. This uses
1004 the reader to read/convert the file contents into mal data/AST.
1005 * Call `eval` (the one in the REPL environment) on the AST returned
1006 from `read-string` to "run" it.
1007
1008 Besides adding file and eval support, we'll add support for the atom data type
1009 in this step. An atom is the Mal way to represent *state*; it is
1010 heavily inspired by [Clojure's atoms](http://clojure.org/state). An atom holds
1011 a reference to a single Mal value of any type; it supports reading that Mal value
1012 and *modifying* the reference to point to another Mal value. Note that this is
1013 the only Mal data type that is mutable (but the Mal values it refers to are
1014 still immutable; immutability is explained in greater detail in step 7).
1015 You'll need to add 5 functions to the core namespace to support atoms:
1016
1017 * `atom`: Takes a Mal value and returns a new atom which points to that Mal value.
1018 * `atom?`: Takes an argument and returns `true` if the argument is an atom.
1019 * `deref`: Takes an atom argument and returns the Mal value referenced by this atom.
1020 * `reset!`: Takes an atom and a Mal value; the atom is modified to refer to
1021 the given Mal value. The Mal value is returned.
1022 * `swap!`: Takes an atom, a function, and zero or more function arguments. The
1023 atom's value is modified to the result of applying the function with the atom's
1024 value as the first argument and the optionally given function arguments as
1025 the rest of the arguments. The new atom's value is returned. (Side note: Mal is
1026 single-threaded, but in concurrent languages like Clojure, `swap!` promises
1027 atomic update: `(swap! myatom (fn* [x] (+ 1 x)))` will always increase the
1028 `myatom` counter by one and will not suffer from missing updates when the
1029 atom is updated from multiple threads.)
1030
1031 Optionally, you can add a reader macro `@` which will serve as a short form for
1032 `deref`, so that `@a` is equivalent to `(deref a)`. In order to do that, modify
1033 the conditional in reader `read_form` function and add a case which deals with
1034 the `@` token: if the token is `@` (at sign) then return a new list that
1035 contains the symbol `deref` and the result of reading the next form
1036 (`read_form`).
1037
1038 Now go to the top level, run the step 6 tests. The optional tests will
1039 need support from the reader for comments, vectors, hash-maps and the `@`
1040 reader macro:
1041 ```
1042 make "test^quux^step6"
1043 ```
1044
1045 Congratulations, you now have a full-fledged scripting language that
1046 can run other mal programs. The `slurp` function loads a file as
1047 a string, the `read-string` function calls the mal reader to turn that
1048 string into data, and the `eval` function takes data and evaluates it
1049 as a normal mal program. However, it is important to note that the
1050 `eval` function is not just for running external programs. Because mal
1051 programs are regular mal data structures, you can dynamically generate
1052 or manipulate those data structures before calling `eval` on them.
1053 This isomorphism (same shape) between data and programs is known as
1054 "homoiconicity". Lisp languages are homoiconic and this property
1055 distinguishes them from most other programming languages.
1056
1057 You mal implementation is quite powerful already but the set of
1058 functions that are available (from `core.qx`) is fairly limited. The
1059 bulk of the functions you will add are described in step 9 and step A,
1060 but you will begin to flesh them out over the next few steps to
1061 support quoting (step 7) and macros (step 8).
1062
1063
1064 #### Deferrable:
1065
1066 * Add the ability to run another mal program from the command line.
1067 Prior to the REPL loop, check if your mal implementation is called
1068 with command line arguments. If so, treat the first argument as
1069 a filename and use `rep` to call `load-file` on that filename, and
1070 finally exit/terminate execution.
1071
1072 * Add the rest of the command line arguments to your REPL environment
1073 so that programs that are run with `load-file` have access to their
1074 calling environment. Add a new "\*ARGV\*" (symbol) entry to your REPL
1075 environment. The value of this entry should be the rest of the
1076 command line arguments as a mal list value.
1077
1078
1079 <a name="step7"></a>
1080
1081 ### Step 7: Quoting
1082
1083 ![step7_quote architecture](step7_quote.png)
1084
1085 In step 7 you will add the special forms `quote` and `quasiquote` and
1086 add supporting core functions `cons` and `concat`. The two quote forms
1087 add a powerful abstraction for manipulating mal code itself
1088 (meta-programming).
1089
1090 The `quote` special form indicates to the evaluator (`EVAL`) that the
1091 parameter should not be evaluated (yet). At first glance, this might
1092 not seem particularly useful but an example of what this enables is the
1093 ability for a mal program to refer to a symbol itself rather than the
1094 value that it evaluates to. Likewise with lists. For example, consider
1095 the following:
1096
1097 * `(prn abc)`: this will lookup the symbol `abc` in the current
1098 evaluation environment and print it. This will result in error if
1099 `abc` is not defined.
1100 * `(prn (quote abc))`: this will print "abc" (prints the symbol
1101 itself). This will work regardless of whether `abc` is defined in
1102 the current environment.
1103 * `(prn (1 2 3))`: this will result in an error because `1` is not
1104 a function and cannot be applied to the arguments `(2 3)`.
1105 * `(prn (quote (1 2 3)))`: this will print "(1 2 3)".
1106 * `(def! l (quote (1 2 3)))`: list quoting allows us to define lists
1107 directly in the code (list literal). Another way of doing this is
1108 with the list function: `(def! l (list 1 2 3))`.
1109
1110 The second special quoting form is `quasiquote`. This allows a quoted
1111 list to have internal elements of the list that are temporarily
1112 unquoted (normal evaluation). There are two special forms that only
1113 mean something within a quasiquoted list: `unquote` and
1114 `splice-unquote`. These are perhaps best explained with some examples:
1115
1116 * `(def! lst (quote (2 3)))` -> `(2 3)`
1117 * `(quasiquote (1 (unquote lst)))` -> `(1 (2 3))`
1118 * `(quasiquote (1 (splice-unquote lst)))` -> `(1 2 3)`
1119
1120 The `unquote` form turns evaluation back on for its argument and the
1121 result of evaluation is put in place into the quasiquoted list. The
1122 `splice-unquote` also turns evaluation back on for its argument, but
1123 the evaluated value must be a list which is then "spliced" into the
1124 quasiquoted list. The true power of the quasiquote form will be
1125 manifest when it is used together with macros (in the next step).
1126
1127 Compare the pseudocode for step 6 and step 7 to get a basic idea of
1128 the changes that will be made during this step:
1129 ```
1130 diff -urp ../process/step6_file.txt ../process/step7_quote.txt
1131 ```
1132
1133 * Copy `step6_file.qx` to `step7_quote.qx`.
1134
1135 * Before implementing the quoting forms, you will need to implement
1136 some supporting functions in the core namespace:
1137 * `cons`: this function takes a list as its second
1138 parameter and returns a new list that has the first argument
1139 prepended to it.
1140 * `concat`: this functions takes 0 or more lists as
1141 parameters and returns a new list that is a concatenation of all
1142 the list parameters.
1143
1144 An aside on immutability: note that neither cons or concat mutate
1145 their original list arguments. Any references to them (i.e. other
1146 lists that they may be "contained" in) will still refer to the
1147 original unchanged value. Mal, like Clojure, is a language which uses
1148 immutable data structures. I encourage you to read about the power and
1149 importance of immutability as implemented in Clojure (from which
1150 Mal borrows most of its syntax and feature-set).
1151
1152 * Add the `quote` special form. This form just returns its argument
1153 (the second list element of `ast`).
1154
1155 * Add the `quasiquote` special form. First implement a helper function
1156 `is_pair` that returns true if the parameter is a non-empty list.
1157 Then define a `quasiquote` function. This is called from `EVAL` with
1158 the first `ast` argument (second list element) and then `ast` is set
1159 to the result and execution continues at the top of the loop (TCO).
1160 The `quasiquote` function takes a parameter `ast` and has the
1161 following conditional:
1162 1. if `is_pair` of `ast` is false: return a new list containing:
1163 a symbol named "quote" and `ast`.
1164 2. else if the first element of `ast` is a symbol named "unquote":
1165 return the second element of `ast`.
1166 3. if `is_pair` of the first element of `ast` is true and the first
1167 element of first element of `ast` (`ast[0][0]`) is a symbol named
1168 "splice-unquote": return a new list containing: a symbol named
1169 "concat", the second element of first element of `ast`
1170 (`ast[0][1]`), and the result of calling `quasiquote` with the
1171 second through last element of `ast`.
1172 4. otherwise: return a new list containing: a symbol named "cons", the
1173 result of calling `quasiquote` on first element of `ast`
1174 (`ast[0]`), and the result of calling `quasiquote` with the second
1175 through last element of `ast`.
1176
1177
1178 Now go to the top level, run the step 7 tests:
1179 ```
1180 make "test^quux^step7"
1181 ```
1182
1183 Quoting is one of the more mundane functions available in mal, but do
1184 not let that discourage you. Your mal implementation is almost
1185 complete, and quoting sets the stage for the next very exciting step:
1186 macros.
1187
1188
1189 #### Deferrable
1190
1191 * The full names for the quoting forms are fairly verbose. Most Lisp
1192 languages have a short-hand syntax and Mal is no exception. These
1193 short-hand syntaxes are known as reader macros because they allow us
1194 to manipulate mal code during the reader phase. Macros that run
1195 during the eval phase are just called "macros" and are described in
1196 the next section. Expand the conditional with reader `read_form`
1197 function to add the following four cases:
1198 * token is "'" (single quote): return a new list that contains the
1199 symbol "quote" and the result of reading the next form
1200 (`read_form`).
1201 * token is "\`" (back-tick): return a new list that contains the
1202 symbol "quasiquote" and the result of reading the next form
1203 (`read_form`).
1204 * token is "~" (tilde): return a new list that contains the
1205 symbol "unquote" and the result of reading the next form
1206 (`read_form`).
1207 * token is "~@" (tilde + at sign): return a new list that contains
1208 the symbol "splice-unquote" and the result of reading the next
1209 form (`read_form`).
1210
1211 * Add support for quoting of vectors. The `is_pair` function should
1212 return true if the argument is a non-empty list or vector. `cons`
1213 should also accept a vector as the second argument. The return value
1214 is a list regardless. `concat` should support concatenation of
1215 lists, vectors, or a mix or both. The result is always a list.
1216
1217
1218 <a name="step8"></a>
1219
1220 ### Step 8: Macros
1221
1222 ![step8_macros architecture](step8_macros.png)
1223
1224 Your mal implementation is now ready for one of the most lispy and
1225 exciting of all programming concepts: macros. In the previous step,
1226 quoting enabled some simple manipulation data structures and therefore
1227 manipulation of mal code (because the `eval` function from step
1228 6 turns mal data into code). In this step you will be able to mark mal
1229 functions as macros which can manipulate mal code before it is
1230 evaluated. In other words, macros are user-defined special forms. Or
1231 to look at it another way, macros allow mal programs to redefine
1232 the mal language itself.
1233
1234 Compare the pseudocode for step 7 and step 8 to get a basic idea of
1235 the changes that will be made during this step:
1236 ```
1237 diff -urp ../process/step7_quote.txt ../process/step8_macros.txt
1238 ```
1239
1240 * Copy `step7_quote.qx` to `step8_macros.qx`.
1241
1242
1243 You might think that the infinite power of macros would require some
1244 sort of complex mechanism, but the implementation is actually fairly
1245 simple.
1246
1247 * Add a new attribute `is_macro` to mal function types. This should
1248 default to false.
1249
1250 * Add a new special form `defmacro!`. This is very similar to the
1251 `def!` form, but before the evaluated value (mal function) is set in
1252 the environment, the `is_macro` attribute should be set to true.
1253
1254 * Add a `is_macro_call` function: This function takes arguments `ast`
1255 and `env`. It returns true if `ast` is a list that contains a symbol
1256 as the first element and that symbol refers to a function in the
1257 `env` environment and that function has the `is_macro` attribute set
1258 to true. Otherwise, it returns false.
1259
1260 * Add a `macroexpand` function: This function takes arguments `ast`
1261 and `env`. It calls `is_macro_call` with `ast` and `env` and loops
1262 while that condition is true. Inside the loop, the first element of
1263 the `ast` list (a symbol), is looked up in the environment to get
1264 the macro function. This macro function is then called/applied with
1265 the rest of the `ast` elements (2nd through the last) as arguments.
1266 The return value of the macro call becomes the new value of `ast`.
1267 When the loop completes because `ast` no longer represents a macro
1268 call, the current value of `ast` is returned.
1269
1270 * In the evaluator (`EVAL`) before the special forms switch (apply
1271 section), perform macro expansion by calling the `macroexpand`
1272 function with the current value of `ast` and `env`. Set `ast` to the
1273 result of that call. If the new value of `ast` is no longer a list
1274 after macro expansion, then return the result of calling `eval_ast`
1275 on it, otherwise continue with the rest of the apply section
1276 (special forms switch).
1277
1278 * Add a new special form condition for `macroexpand`. Call the
1279 `macroexpand` function using the first `ast` argument (second list
1280 element) and `env`. Return the result. This special form allows
1281 a mal program to do explicit macro expansion without applying the
1282 result (which can be useful for debugging macro expansion).
1283
1284 Now go to the top level, run the step 8 tests:
1285 ```
1286 make "test^quux^step8"
1287 ```
1288
1289 There is a reasonably good chance that the macro tests will not pass
1290 the first time. Although the implementation of macros is fairly
1291 simple, debugging runtime bugs with macros can be fairly tricky. If
1292 you do run into subtle problems that are difficult to solve, let me
1293 recommend a couple of approaches:
1294
1295 * Use the macroexpand special form to eliminate one of the layers of
1296 indirection (to expand but skip evaluate). This will often reveal
1297 the source of the issue.
1298 * Add a debug print statement to the top of your main `eval` function
1299 (inside the TCO loop) to print the current value of `ast` (hint use
1300 `pr_str` to get easier to debug output). Pull up the step8
1301 implementation from another language and uncomment its `eval`
1302 function (yes, I give you permission to violate the rule this once).
1303 Run the two side-by-side. The first difference is likely to point to
1304 the bug.
1305
1306 Congratulations! You now have a Lisp interpreter with a super power
1307 that most non-Lisp languages can only dream of (I have it on good
1308 authority that languages dream when you are not using them). If you
1309 are not already familiar with Lisp macros, I suggest the following
1310 exercise: write a recursive macro that handles postfixed mal code
1311 (with the function as the last parameter instead of the first). Or
1312 not. I have not actually done so myself, but I have heard it is an
1313 interesting exercise.
1314
1315 In the next step you will add try/catch style exception handling to
1316 your implementation in addition to some new core functions. After
1317 step9 you will be very close to having a fully self-hosting mal
1318 implementation. Let us continue!
1319
1320
1321 #### Deferrable
1322
1323 * Add the following new core functions which are frequently used in
1324 macro functions:
1325 * `nth`: this function takes a list (or vector) and a number (index)
1326 as arguments, returns the element of the list at the given index.
1327 If the index is out of range, this function raises an exception.
1328 * `first`: this function takes a list (or vector) as its argument
1329 and return the first element. If the list (or vector) is empty or
1330 is `nil` then `nil` is returned.
1331 * `rest`: this function takes a list (or vector) as its argument and
1332 returns a new list containing all the elements except the first.
1333
1334 * In the main program, use the `rep` function to define two new
1335 control structures macros. Here are the string arguments for `rep`
1336 to define these macros:
1337 * `cond`: "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))"
1338 * Note that `cond` calls the `throw` function when `cond` is
1339 called with an odd number of args. The `throw` function is
1340 implemented in the next step, but it will still serve it's
1341 purpose here by causing an undefined symbol error.
1342 * `or`: "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) `(let* (or_FIXME ~(first xs)) (if or_FIXME or_FIXME (or ~@(rest xs))))))))"
1343
1344
1345 <a name="step9"></a>
1346
1347 ### Step 9: Try
1348
1349 ![step9_try architecture](step9_try.png)
1350
1351 In this step you will implement the final mal special form for
1352 error/exception handling: `try*/catch*`. You will also add several core
1353 functions to your implementation. In particular, you will enhance the
1354 functional programming pedigree of your implementation by adding the
1355 `apply` and `map` core functions.
1356
1357 Compare the pseudocode for step 8 and step 9 to get a basic idea of
1358 the changes that will be made during this step:
1359 ```
1360 diff -urp ../process/step8_macros.txt ../process/step9_try.txt
1361 ```
1362
1363 * Copy `step8_macros.qx` to `step9_try.qx`.
1364
1365 * Add the `try*/catch*` special form to the EVAL function. The
1366 try catch form looks like this: `(try* A (catch* B C))`. The form
1367 `A` is evaluated, if it throws an exception, then form `C` is
1368 evaluated with a new environment that binds the symbol `B` to the
1369 value of the exception that was thrown.
1370 * If your target language has built-in try/catch style exception
1371 handling then you are already 90% of the way done. Add a
1372 (native language) try/catch block that evaluates `A` within
1373 the try block and catches all exceptions. If an exception is
1374 caught, then translate it to a mal type/value. For native
1375 exceptions this is either the message string or a mal hash-map
1376 that contains the message string and other attributes of the
1377 exception. When a regular mal type/value is used as an
1378 exception, you will probably need to store it within a native
1379 exception type in order to be able to convey/transport it using
1380 the native try/catch mechanism. Then you will extract the mal
1381 type/value from the native exception. Create a new mal environment
1382 that binds `B` to the value of the exception. Finally, evaluate `C`
1383 using that new environment.
1384 * If your target language does not have built-in try/catch style
1385 exception handling then you have some extra work to do. One of the
1386 most straightforward approaches is to create a a global error
1387 variable that stores the thrown mal type/value. The complication
1388 is that there are a bunch of places where you must check to see if
1389 the global error state is set and return without proceeding. The
1390 rule of thumb is that this check should happen at the top of your
1391 EVAL function and also right after any call to EVAL (and after any
1392 function call that might happen to call EVAL further down the
1393 chain). Yes, it is ugly, but you were warned in the section on
1394 picking a language.
1395
1396 * Add the `throw` core function.
1397 * If your language supports try/catch style exception handling, then
1398 this function takes a mal type/value and throws/raises it as an
1399 exception. In order to do this, you may need to create a custom
1400 exception object that wraps a mal value/type.
1401 * If your language does not support try/catch style exception
1402 handling, then set the global error state to the mal type/value.
1403
1404 * Add the `apply` and `map` core functions. In step 5, if you did not
1405 add the original function (`fn`) to the structure returned from
1406 `fn*`, the you will need to do so now.
1407 * `apply`: takes at least two arguments. The first argument is
1408 a function and the last argument is list (or vector). The
1409 arguments between the function and the last argument (if there are
1410 any) are concatenated with the final argument to create the
1411 arguments that are used to call the function. The apply
1412 function allows a function to be called with arguments that are
1413 contained in a list (or vector). In other words, `(apply F A B [C
1414 D])` is equivalent to `(F A B C D)`.
1415 * `map`: takes a function and a list (or vector) and evaluates the
1416 function against every element of the list (or vector) one at
1417 a time and returns the results as a list.
1418
1419 * Add some type predicates core functions. In Lisp, predicates are
1420 functions that return true/false (or true value/nil) and typically
1421 end in "?" or "p".
1422 * `nil?`: takes a single argument and returns true (mal true value)
1423 if the argument is nil (mal nil value).
1424 * `true?`: takes a single argument and returns true (mal true value)
1425 if the argument is a true value (mal true value).
1426 * `false?`: takes a single argument and returns true (mal true
1427 value) if the argument is a false value (mal false value).
1428 * `symbol?`: takes a single argument and returns true (mal true
1429 value) if the argument is a symbol (mal symbol value).
1430
1431 Now go to the top level, run the step 9 tests:
1432 ```
1433 make "test^quux^step9"
1434 ```
1435
1436 Your mal implementation is now essentially a fully featured Lisp
1437 interpreter. But if you stop now you will miss one of the most
1438 satisfying and enlightening aspects of creating a mal implementation:
1439 self-hosting.
1440
1441 #### Deferrable
1442
1443 * Add the following new core functions:
1444 * `symbol`: takes a string and returns a new symbol with the string
1445 as its name.
1446 * `keyword`: takes a string and returns a keyword with the same name
1447 (usually just be prepending the special keyword
1448 unicode symbol). This function should also detect if the argument
1449 is already a keyword and just return it.
1450 * `keyword?`: takes a single argument and returns true (mal true
1451 value) if the argument is a keyword, otherwise returns false (mal
1452 false value).
1453 * `vector`: takes a variable number of arguments and returns
1454 a vector containing those arguments.
1455 * `vector?`: takes a single argument and returns true (mal true
1456 value) if the argument is a vector, otherwise returns false (mal
1457 false value).
1458 * `sequential?`: takes a single argument and returns true (mal true
1459 value) if it is a list or a vector, otherwise returns false (mal
1460 false value).
1461 * `hash-map`: takes a variable but even number of arguments and
1462 returns a new mal hash-map value with keys from the odd arguments
1463 and values from the even arguments respectively. This is basically
1464 the functional form of the `{}` reader literal syntax.
1465 * `map?`: takes a single argument and returns true (mal true
1466 value) if the argument is a hash-map, otherwise returns false (mal
1467 false value).
1468 * `assoc`: takes a hash-map as the first argument and the remaining
1469 arguments are odd/even key/value pairs to "associate" (merge) into
1470 the hash-map. Note that the original hash-map is unchanged
1471 (remember, mal values are immutable), and a new hash-map
1472 containing the old hash-maps key/values plus the merged key/value
1473 arguments is returned.
1474 * `dissoc`: takes a hash-map and a list of keys to remove from the
1475 hash-map. Again, note that the original hash-map is unchanged and
1476 a new hash-map with the keys removed is returned. Key arguments
1477 that do not exist in the hash-map are ignored.
1478 * `get`: takes a hash-map and a key and returns the value of looking
1479 up that key in the hash-map. If the key is not found in the
1480 hash-map then nil is returned.
1481 * `contains?`: takes a hash-map and a key and returns true (mal true
1482 value) if the key exists in the hash-map and false (mal false
1483 value) otherwise.
1484 * `keys`: takes a hash-map and returns a list (mal list value) of
1485 all the keys in the hash-map.
1486 * `vals`: takes a hash-map and returns a list (mal list value) of
1487 all the values in the hash-map.
1488
1489
1490 <a name="stepA"></a>
1491
1492 ### Step A: Metadata, Self-hosting and Interop
1493
1494 ![stepA_mal architecture](stepA_mal.png)
1495
1496 You have reached the final step of your mal implementation. This step
1497 is kind of a catchall for things that did not fit into other steps.
1498 But most importantly, the changes you make in this step will unlock
1499 the magical power known as "self-hosting". You might have noticed
1500 that one of the languages that mal is implemented in is "mal". Any mal
1501 implementation that is complete enough can run the mal implementation
1502 of mal. You might need to pull out your hammock and ponder this for
1503 a while if you have never built a compiler or interpreter before. Look
1504 at the step source files for the mal implementation of mal (it is not
1505 cheating now that you have reached step A).
1506
1507 If you deferred the implementation of keywords, vectors and hash-maps,
1508 now is the time to go back and implement them if you want your
1509 implementation to self-host.
1510
1511 Compare the pseudocode for step 9 and step A to get a basic idea of
1512 the changes that will be made during this step:
1513 ```
1514 diff -urp ../process/step9_try.txt ../process/stepA_mal.txt
1515 ```
1516
1517 * Copy `step9_try.qx` to `stepA_mal.qx`.
1518
1519 * Add the `readline` core function. This functions takes a
1520 string that is used to prompt the user for input. The line of text
1521 entered by the user is returned as a string. If the user sends an
1522 end-of-file (usually Ctrl-D), then nil is returned.
1523
1524 * Add meta-data support to mal functions by adding a new metadata
1525 attribute on mal functions that refers to another mal value/type
1526 (nil by default). Add the following metadata related core functions:
1527 * `meta`: this takes a single mal function argument and returns the
1528 value of the metadata attribute.
1529 * `with-meta`: this function takes two arguments. The first argument
1530 is a mal function and the second argument is another mal
1531 value/type to set as metadata. A copy of the mal function is
1532 returned that has its `meta` attribute set to the second argument.
1533 Note that it is important that the environment and macro attribute
1534 of mal function are retained when it is copied.
1535 * Add a reader-macro that expands the token "^" to
1536 return a new list that contains the symbol "with-meta" and the
1537 result of reading the next next form (2nd argument) (`read_form`) and the
1538 next form (1st argument) in that order
1539 (metadata comes first with the ^ macro and the function second).
1540
1541 * Add a new "\*host-language\*" (symbol) entry to your REPL
1542 environment. The value of this entry should be a mal string
1543 containing the name of the current implementation.
1544
1545 * When the REPL starts up (as opposed to when it is called with
1546 a script and/or arguments), call the `rep` function with this string
1547 to print a startup header:
1548 "(println (str \"Mal [\" \*host-language\* \"]\"))".
1549
1550
1551 Now go to the top level, run the step A tests:
1552 ```
1553 make "test^quux^stepA"
1554 ```
1555
1556 Once you have passed all the non-optional step A tests, it is time to
1557 try self-hosting. Run your step A implementation as normal, but use
1558 the file argument mode you added in step 6 to run a each of the step
1559 from the mal implementation:
1560 ```
1561 ./stepA_mal.qx ../mal/step1_read_print.mal
1562 ./stepA_mal.qx ../mal/step2_eval.mal
1563 ...
1564 ./stepA_mal.qx ../mal/step9_try.mal
1565 ./stepA_mal.qx ../mal/stepA_mal.mal
1566 ```
1567
1568 There is a very good chance that you will encounter an error at some
1569 point while trying to run the mal in mal implementation steps above.
1570 Debugging failures that happen while self-hosting is MUCH more
1571 difficult and mind bending. One of the best approaches I have
1572 personally found is to add prn statements to the mal implementation
1573 step (not your own implementation of mal) that is causing problems.
1574
1575 Another approach I have frequently used is to pull out the code from
1576 the mal implementation that is causing the problem and simplify it
1577 step by step until you have a simple piece of mal code that still
1578 reproduces the problem. Once the reproducer is simple enough you will
1579 probably know where in your own implementation that problem is likely
1580 to be. Please add your simple reproducer as a test case so that future
1581 implementers will fix similar issues in their code before they get to
1582 self-hosting when it is much more difficult to track down and fix.
1583
1584 Once you can manually run all the self-hosted steps, it is time to run
1585 all the tests in self-hosted mode:
1586 ```
1587 make MAL_IMPL=quux "test^mal"
1588 ```
1589
1590 When you run into problems (which you almost certainly will), use the
1591 same process described above to debug them.
1592
1593 Congratulations!!! When all the tests pass, you should pause for
1594 a moment and consider what you have accomplished. You have implemented
1595 a Lisp interpreter that is powerful and complete enough to run a large
1596 mal program which is itself an implementation of the mal language. You
1597 might even be asking if you can continue the "inception" by using your
1598 implementation to run a mal implementation which itself runs the mal
1599 implementation.
1600
1601
1602 #### Optional: gensym
1603
1604 The `or` macro we introduced at step 8 has a bug. It defines a
1605 variable called `or_FIXME`, which "shadows" such a binding from the
1606 user's code (which uses the macro). If a user has a variable called
1607 `or_FIXME`, it cannot be used as an `or` macro argument. In order to
1608 fix that, we'll introduce `gensym`: a function which returns a symbol
1609 which was never used before anywhere in the program. This is also an
1610 example for the use of mal atoms to keep state (the state here being
1611 the number of symbols produced by `gensym` so far).
1612
1613 Previously you used `rep` to define the `or` macro. Remove that
1614 definition and use `rep` to define the new counter, `gensym` function
1615 and the clean `or` macro. Here are the string arguments you need to
1616 pass to `rep`:
1617 ```
1618 "(def! inc (fn* [x] (+ x 1)))"
1619
1620 "(def! gensym (let* [counter (atom 0)] (fn* [] (symbol (str \"G__\" (swap! counter inc))))))"
1621
1622 "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) (let* (condvar (gensym)) `(let* (~condvar ~(first xs)) (if ~condvar ~condvar (or ~@(rest xs)))))))))"
1623 ```
1624
1625 For extra information read [Peter Seibel's thorough discussion about
1626 `gensym` and leaking macros in Common Lisp](http://www.gigamonkeys.com/book/macros-defining-your-own.html#plugging-the-leaks).
1627
1628
1629 #### Optional additions
1630
1631 * Add metadata support to other composite data types (lists, vectors
1632 and hash-maps), and to native functions.
1633 * Add the following new core functions:
1634 * `time-ms`: takes no arguments and returns the number of
1635 milliseconds since epoch (00:00:00 UTC January 1, 1970), or, if
1636 not possible, since another point in time (`time-ms` is usually
1637 used relatively to measure time durations). After `time-ms` is
1638 implemented, you can run the performance micro-benchmarks by
1639 running `make perf^quux`.
1640 * `conj`: takes a collection and one or more elements as arguments
1641 and returns a new collection which includes the original
1642 collection and the new elements. If the collection is a list, a
1643 new list is returned with the elements inserted at the start of
1644 the given list in opposite order; if the collection is a vector, a
1645 new vector is returned with the elements added to the end of the
1646 given vector.
1647 * `string?`: returns true if the parameter is a string.
1648 * `number?`: returns true if the parameter is a number.
1649 * `fn?`: returns true if the parameter is a function (internal or
1650 user-defined).
1651 * `macro?`: returns true if the parameter is a macro.
1652 * `seq`: takes a list, vector, string, or nil. If an empty list,
1653 empty vector, or empty string ("") is passed in then nil is
1654 returned. Otherwise, a list is returned unchanged, a vector is
1655 converted into a list, and a string is converted to a list that
1656 containing the original string split into single character
1657 strings.
1658 * For interop with the target language, add this core function:
1659 * `quux-eval`: takes a string, evaluates it in the target language,
1660 and returns the result converted to the relevant Mal type. You
1661 may also add other interop functions as you see fit; Clojure, for
1662 example, has a function called `.` which allows calling Java
1663 methods. If the target language is a static language, consider
1664 using FFI or some language-specific reflection mechanism, if
1665 available. The tests for `quux-eval` and any other interop
1666 function should be added in `quux/tests/stepA_mal.mal` (see the
1667 [tests for `lua-eval`](../lua/tests/stepA_mal.mal) as an example).
1668
1669 ### Next Steps
1670
1671 * Join the #mal IRC channel. It's fairly quiet but there are bursts of
1672 interesting conversation related to mal, Lisps, esoteric programming
1673 languages, etc.
1674 * If you have created an implementation for a new target language (or
1675 a unique and interesting variant of an existing implementation),
1676 consider sending a pull request to add it into the main mal
1677 repository. The [FAQ](../docs/FAQ.md#will-you-add-my-new-implementation)
1678 describes general requirements for getting an implementation merged
1679 into the main repository.
1680 * Take your interpreter implementation and have it emit source code in
1681 the target language rather than immediately evaluating it. In other
1682 words, create a compiler.
1683 * Pick a new target language and implement mal in it. Pick a language
1684 that is very different from any that you already know.
1685 * Use your mal implementation to implement a real world project. Many
1686 of these will force you to address interop. Some ideas:
1687 * Web server (with mal as CGI language for extra points)
1688 * An IRC/Slack chat bot
1689 * An editor (GUI or curses) with mal as a scripting/extension
1690 language.
1691 * An AI player for a game like Chess or Go.
1692 * Implement a feature in your mal implementation that is not covered
1693 by this guide. Some ideas:
1694 * Namespaces
1695 * Multi-threading support
1696 * Errors with line numbers and/or stack traces.
1697 * Lazy sequences
1698 * Clojure-style protocols
1699 * Full call/cc (call-with-current-continuation) support
1700 * Explicit TCO (i.e. `recur`) with tail-position error checking