Merge pull request #216 from hinrik/add_perl6_impl
[jackhill/mal.git] / process / guide.md
1 # The Make-A-Lisp Process
2
3 So you want to write a Lisp interpreter? Welcome!
4
5 The goal of the Make-A-Lisp project is to make it easy to write your
6 own Lisp interpreter without sacrificing those many "Aha!" moments
7 that come from ascending the McCarthy mountain. When you reach the peak
8 of this particular mountain, you will have an interpreter for the mal
9 Lisp language that is powerful enough to be self-hosting, meaning it
10 will be able to run a mal interpreter written in mal itself.
11
12 So jump right in (er ... start the climb)!
13
14 - [Pick a language](#pick-a-language)
15 - [Getting started](#getting-started)
16 - [General hints](#general-hints)
17 - [The Make-A-Lisp Process](#the-make-a-lisp-process-1)
18 - [Step 0: The REPL](#step-0-the-repl)
19 - [Step 1: Read and Print](#step-1-read-and-print)
20 - [Step 2: Eval](#step-2-eval)
21 - [Step 3: Environments](#step-3-environments)
22 - [Step 4: If Fn Do](#step-4-if-fn-do)
23 - [Step 5: Tail call optimization](#step-5-tail-call-optimization)
24 - [Step 6: Files, Mutation, and Evil](#step-6-files-mutation-and-evil)
25 - [Step 7: Quoting](#step-7-quoting)
26 - [Step 8: Macros](#step-8-macros)
27 - [Step 9: Try](#step-9-try)
28 - [Step A: Metadata, Self-hosting and Interop](#step-a-metadata-self-hosting-and-interop)
29
30
31 ## Pick a language
32
33 You might already have a language in mind that you want to use.
34 Technically speaking, mal can be implemented in any sufficiently
35 complete programming language (i.e. Turing complete), however, there are a few
36 language features that can make the task MUCH easier. Here are some of
37 them in rough order of importance:
38
39 * A sequential compound data structure (e.g. arrays, lists,
40 vectors, etc)
41 * An associative compound data structure (e.g. a dictionary,
42 hash-map, associative array, etc)
43 * Function references (first class functions, function pointers,
44 etc)
45 * Real exception handling (try/catch, raise, throw, etc)
46 * Variable argument functions (variadic, var args, splats, apply, etc)
47 * Function closures
48 * PCRE regular expressions
49
50 In addition, the following will make your task especially easy:
51
52 * Dynamic typing / boxed types (specifically, the ability to store
53 different data types in the sequential and associative structures
54 and the language keeps track of the type for you)
55 * Compound data types support arbitrary runtime "hidden" data
56 (metadata, metatables, dynamic fields attributes)
57
58 Here are some examples of languages that have all of the above
59 features: JavaScript, Ruby, Python, Lua, R, Clojure.
60
61 Michael Fogus has some great blog posts on interesting but less well
62 known languages and many of the languages on his lists do not yet have
63 any mal implementations:
64 * http://blog.fogus.me/2011/08/14/perlis-languages/
65 * http://blog.fogus.me/2011/10/18/programming-language-development-the-past-5-years/
66
67 Many of the most popular languages already have Mal implementations.
68 However, this should not discourage you from creating your own
69 implementation in a language that already has one. However, if you go
70 this route, I suggest you avoid referring to the existing
71 implementations (i.e. "cheating") to maximize your learning experience
72 instead of just borrowing mine. On the other hand, if your goal is to
73 add new implementations to mal as efficiently as possible, then you
74 SHOULD find the most similar target language implementation and refer
75 to it frequently.
76
77 If you want a fairly long list of programming languages with an
78 approximate measure of popularity, try the [Programming Language
79 Popularity Chart](http://langpop.corger.nl/)
80
81
82 ## Getting started
83
84 * Install your chosen language interpreter/compiler, language package
85 manager and build tools (if applicable)
86
87 * Fork the mal repository on github and then clone your forked
88 repository:
89 ```
90 git clone git@github.com:YOUR_NAME/mal.git
91 cd mal
92 ```
93
94 * Make a new directory for your implementation. For example, if your
95 language is called "quux":
96 ```
97 mkdir quux
98 ```
99
100 * Modify the top level Makefile to allow the tests to be run against
101 your implementation. For example, if your language is named "quux"
102 and uses "qx" as the file extension, then make the following
103 3 modifications to Makefile:
104 ```
105 IMPLS = ... quux ...
106 ...
107 quux_STEP_TO_PROG = mylang/$($(1)).qx
108 ```
109
110 * Add a "run" script to you implementation directory that listens to
111 the "STEP" environment variable for the implementation step to run
112 and defaults to "stepA_mal". The following are examples of "run"
113 scripts for a compiled language and an interpreted language (where
114 the interpreter is named "quux"):
115
116 ```
117 #!/bin/bash
118 exec $(dirname $0)/${STEP:-stepA_mal} "${@}"
119 ```
120
121 ```
122 #!/bin/bash
123 exec quux $(dirname $0)/${STEP:-stepA_mal}.qx "${@}"
124 ```
125
126 This allows you to run tests against your implementation like this:
127 ```
128 make "test^quux^stepX"
129 ```
130
131 If your implementation language is a compiled language, then you
132 should also add a Makefile at the top level of your implementation
133 directory. This Makefile will define how to build the files pointed to
134 by the quux_STEP_TO_PROG macro. The top-level Makefile will attempt to
135 build those targets before running tests. If it is a scripting
136 language/uncompiled, then no Makefile is necessary because
137 quux_STEP_TO_PROG will point to a source file that already exists and
138 does not need to be compiled/built.
139
140
141 ## General hints
142
143 Stackoverflow and Google are your best friends. Modern polyglot
144 developers do not memorize dozens of programming languages. Instead,
145 they learn the peculiar terminology used with each language and then
146 use this to search for their answers.
147
148 Here are some other resources where multiple languages are
149 compared/described:
150 * http://learnxinyminutes.com/
151 * http://hyperpolyglot.org/
152 * http://rosettacode.org/
153 * http://rigaux.org/language-study/syntax-across-languages/
154
155 Do not let yourself be bogged down by specific problems. While the
156 make-a-lisp process is structured as a series of steps, the reality is
157 that building a lisp interpreter is more like a branching tree. If you
158 get stuck on tail call optimization, or hash-maps, move on to other
159 things. You will often have a stroke of inspiration for a problem as
160 you work through other functionality. I have tried to structure this
161 guide and the tests to make clear which things can be deferred until
162 later.
163
164 An aside on deferrable/optional bits: when you run the tests for
165 a given step, the last tests are often marked with an "optional"
166 header. This indicates that these are tests for functionality that is
167 not critical to finish a basic mal implementation. Many of the steps
168 in this process guide have a "Deferrable" section, however, it is not
169 quite the same meaning. Those sections include the functionality that
170 is marked as optional in the tests, but they also include
171 functionality that becomes mandatory at a later step. In other words,
172 this is a "make your own Lisp adventure".
173
174 Use test driven development. Each step of the make-a-lisp process has
175 a bunch of tests associated with it and there is an easy script to run
176 all the tests for a specific step in the process. Pick a failing test,
177 fix it, repeat until all the tests for that step pass.
178
179 ## Reference Code
180
181 The `process` directory contains abbreviated pseudocode and
182 architecture diagrams for each step of the make-a-lisp process. Use
183 a textual diff/comparison tool to compare the previous pseudocode step
184 with the one you are working on. The architecture diagram images have
185 changes from the previous step highlighted in red. There is also
186 a concise
187 [cheatsheet](http://kanaka.github.io/mal/process/cheatsheet.html) that
188 summarizes the key changes at each step.
189
190 If you get completely stuck and are feeling like giving up, then you
191 should "cheat" by referring to the same step or functionality in
192 a existing implementation language. You are here to learn, not to take
193 a test, so do not feel bad about it. Okay, you should feel a little
194 bit bad about it.
195
196
197 ## The Make-A-Lisp Process
198
199 In the steps that follow the name of the target language is "quux" and
200 the file extension for that language is "qx".
201
202
203 <a name="step0"></a>
204
205 ### Step 0: The REPL
206
207 ![step0_repl architecture](step0_repl.png)
208
209 This step is basically just creating a skeleton of your interpreter.
210
211 * Create a `step0_repl.qx` file in `quux/`.
212
213 * Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep`
214 (read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just
215 stubs that return their first parameter (a string if your target
216 language is a statically typed) and `rep` calls them in order
217 passing the return to the input of the next.
218
219 * Add a main loop that repeatedly prints a prompt (needs to be
220 "user> " for later tests to pass), gets a line of input from the
221 user, calls `rep` with that line of input, and then prints out the
222 result from `rep`. It should also exit when you send it an EOF
223 (often Ctrl-D).
224
225 * If you are using a compiled (ahead-of-time rather than just-in-time)
226 language, then create a Makefile (or appropriate project definition
227 file) in your directory.
228
229 It is time to run your first tests. This will check that your program
230 does input and output in a way that can be captured by the test
231 harness. Go to the top level and run the following:
232 ```
233 make "test^quux^step0"
234 ```
235
236 Add and then commit your new `step0_repl.qx` and `Makefile` to git.
237
238 Congratulations! You have just completed the first step of the
239 make-a-lisp process.
240
241
242 #### Optional:
243
244 * Add full line editing and command history support to your
245 interpreter REPL. Many languages have a library/module that provide
246 line editing support. Another option if your language supports it is
247 to use an FFI (foreign function interface) to load and call directly
248 into GNU readline, editline, or linenoise library. Add line
249 editing interface code to `readline.qx`
250
251
252 <a name="step1"></a>
253
254 ### Step 1: Read and Print
255
256 ![step1_read_print architecture](step1_read_print.png)
257
258 In this step, your interpreter will "read" the string from the user
259 and parse it into an internal tree data structure (an abstract syntax
260 tree) and then take that data structure and "print" it back to
261 a string.
262
263 In non-lisp languages, this step (called "lexing and parsing") can be
264 one of the most complicated parts of the compiler/interpreter. In
265 Lisp, the data structure that you want in memory is basically
266 represented directly in the code that the programmer writes
267 (homoiconicity).
268
269 For example, if the string is "(+ 2 (* 3 4))" then the read function
270 will process this into a tree structure that looks like this:
271 ```
272 List
273 / | \
274 / | \
275 / | \
276 Sym:+ Int:2 List
277 / | \
278 / | \
279 / | \
280 Sym:* Int:3 Int:4
281 ```
282
283 Each left paren and its matching right paren (lisp "sexpr") becomes
284 a node in the tree and everything else becomes a leaf in the tree.
285
286 If you can find code for an implementation of a JSON encoder/decoder
287 in your target language then you can probably just borrow and modify
288 that and be 75% of the way done with this step.
289
290 The rest of this section is going to assume that you are not starting
291 from an existing JSON encoder/decoder, but that you do have access to
292 a Perl compatible regular expressions (PCRE) module/library. You can
293 certainly implement the reader using simple string operations, but it
294 is more involved. The `make`, `ps` (postscript) and Haskell
295 implementations have examples of a reader/parser without using regular
296 expression support.
297
298 * Copy `step0_repl.qx` to `step1_read_print.qx`.
299
300 * Add a `reader.qx` file to hold functions related to the reader.
301
302 * If the target language has objects types (OOP), then the next step
303 is to create a simple stateful Reader object in `reader.qx`. This
304 object will store the tokens and a position. The Reader object will
305 have two methods: `next` and `peek`. `next` returns the token at
306 the current position and increments the position. `peek` just
307 returns the token at the current position.
308
309 * Add a function `read_str` in `reader.qx`. This function
310 will call `tokenizer` and then create a new Reader object instance
311 with the tokens. Then it will call `read_form` with the Reader
312 instance.
313
314 * Add a function `tokenizer` in `reader.qx`. This function will take
315 a single string and return an array/list
316 of all the tokens (strings) in it. The following regular expression
317 (PCRE) will match all mal tokens.
318 ```
319 [\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"|;.*|[^\s\[\]{}('"`,;)]*)
320 ```
321 * For each match captured within the parenthesis starting at char 6 of the
322 regular expression a new token will be created.
323
324 * `[\s,]*`: Matches any number of whitespaces or commas. This is not captured
325 so it will be ignored and not tokenized.
326
327 * `~@`: Captures the special two-characters `~@` (tokenized).
328
329 * ```[\[\]{}()'`~^@]```: Captures any special single character, one of
330 ```[]{}'`~^@``` (tokenized).
331
332 * `"(?:\\.|[^\\"])*"`: Starts capturing at a double-quote and stops at the
333 next double-quote unless it was proceeded by a backslash in which case it
334 includes it until the next double-quote (tokenized).
335
336 * `;.*`: Captures any sequence of characters starting with `;` (tokenized).
337
338 * ```[^\s\[\]{}('"`,;)]*```: Captures a sequence of zero or more non special
339 characters (e.g. symbols, numbers, "true", "false", and "nil") and is sort
340 of the inverse of the one above that captures special characters (tokenized).
341
342 * Add the function `read_form` to `reader.qx`. This function
343 will peek at the first token in the Reader object and switch on the
344 first character of that token. If the character is a left paren then
345 `read_list` is called with the Reader object. Otherwise, `read_atom`
346 is called with the Reader Object. The return value from `read_form`
347 is a mal data type. If your target language is statically typed then
348 you will need some way for `read_form` to return a variant or
349 subclass type. For example, if your language is object oriented,
350 then you can define a top level MalType (in `types.qx`) that all
351 your mal data types inherit from. The MalList type (which also
352 inherits from MalType) will contains a list/array of other MalTypes.
353 If your language is dynamically typed then you can likely just
354 return a plain list/array of other mal types.
355
356 * Add the function `read_list` to `reader.qx`. This function will
357 repeatedly call `read_form` with the Reader object until it
358 encounters a ')' token (if it reach EOF before reading a ')' then
359 that is an error). It accumulates the results into a List type. If
360 your language does not have a sequential data type that can hold mal
361 type values you may need to implement one (in `types.qx`). Note
362 that `read_list` repeatedly calls `read_form` rather than
363 `read_atom`. This mutually recursive definition between `read_list`
364 and `read_form` is what allows lists to contain lists.
365
366 * Add the function `read_atom` to `reader.qx`. This function will
367 look at the contents of the token and return the appropriate scalar
368 (simple/single) data type value. Initially, you can just implement
369 numbers (integers) and symbols. This will allow you to proceed
370 through the next couple of steps before you will need to implement
371 the other fundamental mal types: nil, true, false, and string. The
372 remaining mal types: keyword, vector, hash-map, and atom do not
373 need to be implemented until step 9 (but can be implemented at any
374 point between this step and that). BTW, symbols types are just an
375 object that contains a single string name value (some languages have
376 symbol types already).
377
378 * Add a file `printer.qx`. This file will contain a single function
379 `pr_str` which does the opposite of `read_str`: take a mal data
380 structure and return a string representation of it. But `pr_str` is
381 much simpler and is basically just a switch statement on the type of
382 the input object:
383
384 * symbol: return the string name of the symbol
385 * number: return the number as a string
386 * list: iterate through each element of the list calling `pr_str` on
387 it, then join the results with a space separator, and surround the
388 final result with parens
389
390 * Change the `READ` function in `step1_read_print.qx` to call
391 `reader.read_str` and the `PRINT` function to call `printer.pr_str`.
392 `EVAL` continues to simply return its input but the type is now
393 a mal data type.
394
395 You now have enough hooked up to begin testing your code. You can
396 manually try some simple inputs:
397 * `123` -> `123`
398 * ` 123 ` -> `123`
399 * `abc` -> `abc`
400 * ` abc ` -> `abc`
401 * `(123 456)` -> `(123 456)`
402 * `( 123 456 789 ) ` -> `(123 456 789)`
403 * `( + 2 (* 3 4) ) ` -> `(+ 2 (* 3 4))`
404
405 To verify that your code is doing more than just eliminating extra
406 spaces (and not failing), you can instrument your `reader.qx` functions.
407
408 Once you have gotten past those simple manual tests, it is time to run
409 the full suite of step 1 tests. Go to the top level and run the
410 following:
411 ```
412 make "test^quux^step1"
413 ```
414
415 Fix any test failures related to symbols, numbers and lists.
416
417 Depending on the functionality of your target language, it is likely
418 that you have now just completed one of the most difficult steps. It
419 is down hill from here. The remaining steps will probably be easier
420 and each step will give progressively more bang for the buck.
421
422 #### Deferrable:
423
424
425 * Add support for the other basic data type to your reader and printer
426 functions: string, nil, true, and false. These become mandatory at
427 step 4. When a string is read, the following transformations are
428 applied: a backslash followed by a doublequote is translated into
429 a plain doublequote character, a backslash followed by "n" is
430 translated into a newline, and a backslash followed by another
431 backslash is translated into a single backslash. To properly print
432 a string (for step 4 string functions), the `pr_str` function needs
433 another parameter called `print_readably`. When `print_readably` is
434 true, doublequotes, newlines, and backslashes are translated into
435 their printed representations (the reverse of the reader). The
436 `PRINT` function in the main program should call `pr_str` with
437 print_readably set to true.
438
439 * Add error checking to your reader functions to make sure parens
440 are properly matched. Catch and print these errors in your main
441 loop. If your language does not have try/catch style bubble up
442 exception handling, then you will need to add explicit error
443 handling to your code to catch and pass on errors without crashing.
444
445 * Add support for reader macros which are forms that are
446 transformed into other forms during the read phase. Refer to
447 `tests/step1_read_print.mal` for the form that these macros should
448 take (they are just simple transformations of the token stream).
449
450 * Add support for the other mal types: keyword, vector, hash-map.
451 * keyword: a keyword is a token that begins with a colon. A keyword
452 can just be stored as a string with special unicode prefix like
453 0x29E (or char 0xff/127 if the target language does not have good
454 unicode support) and the printer translates strings with that
455 prefix back to the keyword representation. This makes it easy to
456 use keywords as hash map keys in most languages. You can also
457 store keywords as a unique data type, but you will need to make
458 sure they can be used as hash map keys (which may involve doing
459 a similar prefixed translation anyways).
460 * vector: a vector can be implemented with same underlying
461 type as a list as long as there is some mechanism to keep track of
462 the difference. You can use the same reader function for both
463 lists and vectors by adding parameters for the starting and ending
464 tokens.
465 * hash-map: a hash-map is an associative data structure that maps
466 strings to other mal values. If you implement keywords as prefixed
467 strings, then you only need a native associative data structure
468 which supports string keys. Clojure allows any value to be a hash
469 map key, but the base functionality in mal is to support strings
470 and keyword keys. Because of the representation of hash-maps as
471 an alternating sequence of keys and values, you can probably use
472 the same reader function for hash-maps as lists and vectors with
473 parameters to indicate the starting and ending tokens. The odd
474 tokens are then used for keys with the corresponding even tokens
475 as the values.
476
477 * Add comment support to your reader. The tokenizer should ignore
478 tokens that start with ";". Your `read_str` function will need to
479 properly handle when the tokenizer returns no values. The simplest
480 way to do this is to return `nil` mal value. A cleaner option (that
481 does not print `nil` at the prompt is to throw a special exception
482 that causes the main loop to simply continue at the beginning of the
483 loop without calling `rep`.
484
485
486 <a name="step2"></a>
487
488 ### Step 2: Eval
489
490 ![step2_eval architecture](step2_eval.png)
491
492 In step 1 your mal interpreter was basically just a way to validate
493 input and eliminate extraneous white space. In this step you will turn
494 your interpreter into a simple number calculator by adding
495 functionality to the evaluator (`EVAL`).
496
497 Compare the pseudocode for step 1 and step 2 to get a basic idea of
498 the changes that will be made during this step:
499 ```
500 diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt
501 ```
502
503 * Copy `step1_read_print.qx` to `step2_eval.qx`.
504
505 * Define a simple initial REPL environment. This environment is an
506 associative structure that maps symbols (or symbol names) to
507 numeric functions. For example, in python this would look something
508 like this:
509 ```
510 repl_env = {'+': lambda a,b: a+b,
511 '-': lambda a,b: a-b,
512 '*': lambda a,b: a*b,
513 '/': lambda a,b: int(a/b)}
514 ```
515
516 * Modify the `rep` function to pass the REPL environment as the second
517 parameter for the `EVAL` call.
518
519 * Create a new function `eval_ast` which takes `ast` (mal data type)
520 and an associative structure (the environment from above).
521 `eval_ast` switches on the type of `ast` as follows:
522
523 * symbol: lookup the symbol in the environment structure and return
524 the value or raise an error no value is found
525 * list: return a new list that is the result of calling `EVAL` on
526 each of the members of the list
527 * otherwise just return the original `ast` value
528
529 * Modify `EVAL` to check if the first parameter `ast` is a list.
530 * `ast` is not a list: then return the result of calling `eval_ast`
531 on it.
532 * `ast` is a empty list: return ast unchanged.
533 * `ast` is a list: call `eval_ast` to get a new evaluated list. Take
534 the first item of the evaluated list and call it as function using
535 the rest of the evaluated list as its arguments.
536
537 If your target language does not have full variable length argument
538 support (e.g. variadic, vararg, splats, apply) then you will need to
539 pass the full list of arguments as a single parameter and split apart
540 the individual values inside of every mal function. This is annoying,
541 but workable.
542
543 The process of taking a list and invoking or executing it to return
544 something new is known in Lisp as the "apply" phase.
545
546 Try some simple expressions:
547
548 * `(+ 2 3)` -> `5`
549 * `(+ 2 (* 3 4))` -> `14`
550
551 The most likely challenge you will encounter is how to properly call
552 a function references using an arguments list.
553
554 Now go to the top level, run the step 2 tests and fix the errors.
555 ```
556 make "test^quux^step2"
557 ```
558
559 You now have a simple prefix notation calculator!
560
561 #### Deferrable:
562
563 * `eval_ast` should evaluate elements of vectors and hash-maps. Add the
564 following cases in `eval_ast`:
565 * If `ast` is a vector: return a new vector that is the result of calling
566 `EVAL` on each of the members of the vector.
567 * If `ast` is a hash-map: return a new hash-map which consists of key-value
568 pairs where the key is a key from the hash-map and the value is the result
569 of calling `EVAL` on the corresponding value.
570
571
572 <a name="step3"></a>
573
574 ### Step 3: Environments
575
576 ![step3_env architecture](step3_env.png)
577
578 In step 2 you were already introduced to REPL environment (`repl_env`)
579 where the basic numeric functions were stored and looked up. In this
580 step you will add the ability to create new environments (`let*`) and
581 modify existing environments (`def!`).
582
583 A Lisp environment is an associative data structure that maps symbols (the
584 keys) to values. But Lisp environments have an additional important
585 function: they can refer to another environment (the outer
586 environment). During environment lookups, if the current environment
587 does not have the symbol, the lookup continues in the outer
588 environment, and continues this way until the symbol is either found,
589 or the outer environment is `nil` (the outermost environment in the
590 chain).
591
592 Compare the pseudocode for step 2 and step 3 to get a basic idea of
593 the changes that will be made during this step:
594 ```
595 diff -urp ../process/step2_eval.txt ../process/step3_env.txt
596 ```
597
598 * Copy `step2_eval.qx` to `step3_env.qx`.
599
600 * Create `env.qx` to hold the environment definition.
601
602 * Define an `Env` object that is instantiated with a single `outer`
603 parameter and starts with an empty associative data structure
604 property `data`.
605
606 * Define three methods for the Env object:
607 * set: takes a symbol key and a mal value and adds to the `data`
608 structure
609 * find: takes a symbol key and if the current environment contains
610 that key then return the environment. If no key is found and outer
611 is not `nil` then call find (recurse) on the outer environment.
612 * get: takes a symbol key and uses the `find` method to locate the
613 environment with the key, then returns the matching value. If no
614 key is found up the outer chain, then throws/raises a "not found"
615 error.
616
617 * Update `step3_env.qx` to use the new `Env` type to create the
618 repl_env (with a `nil` outer value) and use the `set` method to add
619 the numeric functions.
620
621 * Modify `eval_ast` to call the `get` method on the `env` parameter.
622
623 * Modify the apply section of `EVAL` to switch on the first element of
624 the list:
625 * symbol "def!": call the set method of the current environment
626 (second parameter of `EVAL` called `env`) using the unevaluated
627 first parameter (second list element) as the symbol key and the
628 evaluated second parameter as the value.
629 * symbol "let\*": create a new environment using the current
630 environment as the outer value and then use the first parameter as
631 a list of new bindings in the "let\*" environment. Take the second
632 element of the binding list, call `EVAL` using the new "let\*"
633 environment as the evaluation environment, then call `set` on the
634 "let\*" environment using the first binding list element as the key
635 and the evaluated second element as the value. This is repeated
636 for each odd/even pair in the binding list. Note in particular,
637 the bindings earlier in the list can be referred to by later
638 bindings. Finally, the second parameter (third element) of the
639 original `let*` form is evaluated using the new "let\*" environment
640 and the result is returned as the result of the `let*` (the new
641 let environment is discarded upon completion).
642 * otherwise: call `eval_ast` on the list and apply the first element
643 to the rest as before.
644
645 `def!` and `let*` are Lisp "specials" (or "special atoms") which means
646 that they are language level features and more specifically that the
647 rest of the list elements (arguments) may be evaluated differently (or
648 not at all) unlike the default apply case where all elements of the
649 list are evaluated before the first element is invoked. Lists which
650 contain a "special" as the first element are known as "special forms".
651 They are special because they follow special evaluation rules.
652
653 Try some simple environment tests:
654
655 * `(def! a 6)` -> `6`
656 * `a` -> `6`
657 * `(def! b (+ a 2))` -> `8`
658 * `(+ a b)` -> `14`
659 * `(let* (c 2) c)` -> `2`
660
661 Now go to the top level, run the step 3 tests and fix the errors.
662 ```
663 make "test^quux^step3"
664 ```
665
666 You mal implementation is still basically just a numeric calculator
667 with save/restore capability. But you have set the foundation for step
668 4 where it will begin to feel like a real programming language.
669
670
671 An aside on mutation and typing:
672
673 The "!" suffix on symbols is used to indicate that this symbol refers
674 to a function that mutates something else. In this case, the `def!`
675 symbol indicates a special form that will mutate the current
676 environment. Many (maybe even most) of runtime problems that are
677 encountered in software engineering are a result of mutation. By
678 clearly marking code where mutation may occur, you can more easily
679 track down the likely cause of runtime problems when they do occur.
680
681 Another cause of runtime errors is type errors, where a value of one
682 type is unexpectedly treated by the program as a different and
683 incompatible type. Statically typed languages try to make the
684 programmer solve all type problems before the program is allowed to
685 run. Most Lisp variants tend to be dynamically typed (types of values
686 are checked when they are actually used at runtime).
687
688 As an aside-aside: The great debate between static and dynamic typing
689 can be understood by following the money. Advocates of strict static
690 typing use words like "correctness" and "safety" and thus get
691 government and academic funding. Advocates of dynamic typing use words
692 like "agile" and "time-to-market" and thus get venture capital and
693 commercial funding.
694
695
696 <a name="step4"></a>
697
698 ### Step 4: If Fn Do
699
700 ![step4_if_fn_do architecture](step4_if_fn_do.png)
701
702 In step 3 you added environments and the special forms for
703 manipulating environments. In this step you will add 3 new special
704 forms (`if`, `fn*` and `do`) and add several more core functions to
705 the default REPL environment. Our new architecture will look like
706 this:
707
708 The `fn*` special form is how new user-defined functions are created.
709 In some Lisps, this special form is named "lambda".
710
711 Compare the pseudocode for step 3 and step 4 to get a basic idea of
712 the changes that will be made during this step:
713 ```
714 diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt
715 ```
716
717 * Copy `step3_env.qx` to `step4_if_fn_do.qx`.
718
719 * If you have not implemented reader and printer support (and data
720 types) for `nil`, `true` and `false`, you will need to do so for
721 this step.
722
723 * Update the constructor/initializer for environments to take two new
724 arguments: `binds` and `exprs`. Bind (`set`) each element (symbol)
725 of the binds list to the respective element of the `exprs` list.
726
727 * Add support to `printer.qx` to print functions values. A string
728 literal like "#<function>" is sufficient.
729
730 * Add the following special forms to `EVAL`:
731
732 * `do`: Evaluate all the elements of the list using `eval_ast`
733 and return the final evaluated element.
734 * `if`: Evaluate the first parameter (second element). If the result
735 (condition) is anything other than `nil` or `false`, then evaluate
736 the second parameter (third element of the list) and return the
737 result. Otherwise, evaluate the third parameter (fourth element)
738 and return the result. If condition is false and there is no third
739 parameter, then just return `nil`.
740 * `fn*`: Return a new function closure. The body of that closure
741 does the following:
742 * Create a new environment using `env` (closed over from outer
743 scope) as the `outer` parameter, the first parameter (second
744 list element of `ast` from the outer scope) as the `binds`
745 parameter, and the parameters to the closure as the `exprs`
746 parameter.
747 * Call `EVAL` on the second parameter (third list element of `ast`
748 from outer scope), using the new environment. Use the result as
749 the return value of the closure.
750
751 If your target language does not support closures, then you will need
752 to implement `fn*` using some sort of structure or object that stores
753 the values being closed over: the first and second elements of the
754 `ast` list (function parameter list and function body) and the current
755 environment `env`. In this case, your native functions will need to be
756 wrapped in the same way. You will probably also need a method/function
757 that invokes your function object/structure for the default case of
758 the apply section of `EVAL`.
759
760 Try out the basic functionality you have implemented:
761
762 * `(fn* [a] a)` -> `#<function>`
763 * `( (fn* [a] a) 7)` -> `7`
764 * `( (fn* [a] (+ a 1)) 10)` -> `11`
765 * `( (fn* [a b] (+ a b)) 2 3)` -> `5`
766
767 * Add a new file `core.qx` and define an associative data structure
768 `ns` (namespace) that maps symbols to functions. Move the numeric
769 function definitions into this structure.
770
771 * Modify `step4_if_fn_do.qx` to iterate through the `core.ns`
772 structure and add (`set`) each symbol/function mapping to the
773 REPL environment (`repl_env`).
774
775 * Add the following functions to `core.ns`:
776 * `prn`: call `pr_str` on the first parameter with `print_readably`
777 set to true, prints the result to the screen and then return
778 `nil`. Note that the full version of `prn` is a deferrable below.
779 * `list`: take the parameters and return them as a list.
780 * `list?`: return true if the first parameter is a list, false
781 otherwise.
782 * `empty?`: treat the first parameter as a list and return true if
783 the list is empty and false if it contains any elements.
784 * `count`: treat the first parameter as a list and return the number
785 of elements that it contains.
786 * `=`: compare the first two parameters and return true if they are
787 the same type and contain the same value. In the case of equal
788 length lists, each element of the list should be compared for
789 equality and if they are the same return true, otherwise false.
790 * `<`, `<=`, `>`, and `>=`: treat the first two parameters as
791 numbers and do the corresponding numeric comparison, returning
792 either true or false.
793
794 Now go to the top level, run the step 4 tests. There are a lot of
795 tests in step 4 but all of the non-optional tests that do not involve
796 strings should be able to pass now.
797
798 ```
799 make "test^quux^step4"
800 ```
801
802 Your mal implementation is already beginning to look like a real
803 language. You have flow control, conditionals, user-defined functions
804 with lexical scope, side-effects (if you implement the string
805 functions), etc. However, our little interpreter has not quite reached
806 Lisp-ness yet. The next several steps will take your implementation
807 from a neat toy to a full featured language.
808
809 #### Deferrable:
810
811 * Implement Clojure-style variadic function parameters. Modify the
812 constructor/initializer for environments, so that if a "&" symbol is
813 encountered in the `binds` list, the next symbol in the `binds` list
814 after the "&" is bound to the rest of the `exprs` list that has not
815 been bound yet.
816
817 * Define a `not` function using mal itself. In `step4_if_fn_do.qx`
818 call the `rep` function with this string:
819 "(def! not (fn* (a) (if a false true)))".
820
821 * Implement the strings functions in `core.qx`. To implement these
822 functions, you will need to implement the string support in the
823 reader and printer (deferrable section of step 1). Each of the string
824 functions takes multiple mal values, prints them (`pr_str`) and
825 joins them together into a new string.
826 * `pr-str`: calls `pr_str` on each argument with `print_readably`
827 set to true, joins the results with " " and returns the new
828 string.
829 * `str`: calls `pr_str` on each argument with `print_readably` set
830 to false, concatenates the results together ("" separator), and
831 returns the new string.
832 * `prn`: calls `pr_str` on each argument with `print_readably` set
833 to true, joins the results with " ", prints the string to the
834 screen and then returns `nil`.
835 * `println`: calls `pr_str` on each argument with `print_readably` set
836 to false, joins the results with " ", prints the string to the
837 screen and then returns `nil`.
838
839
840 <a name="step5"></a>
841
842 ### Step 5: Tail call optimization
843
844 ![step5_tco architecture](step5_tco.png)
845
846 In step 4 you added special forms `do`, `if` and `fn*` and you defined
847 some core functions. In this step you will add a Lisp feature called
848 tail call optimization (TCO). Also called "tail recursion" or
849 sometimes just "tail calls".
850
851 Several of the special forms that you have defined in `EVAL` end up
852 calling back into `EVAL`. For those forms that call `EVAL` as the last
853 thing that they do before returning (tail call) you will just loop back
854 to the beginning of eval rather than calling it again. The advantage
855 of this approach is that it avoids adding more frames to the call
856 stack. This is especially important in Lisp languages because they tend
857 to prefer using recursion instead of iteration for control structures.
858 (Though some Lisps, such as Common Lisp, have iteration.) However, with
859 tail call optimization, recursion can be made as stack efficient as
860 iteration.
861
862 Compare the pseudocode for step 4 and step 5 to get a basic idea of
863 the changes that will be made during this step:
864 ```
865 diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt
866 ```
867
868 * Copy `step4_if_fn_do.qx` to `step5_tco.qx`.
869
870 * Add a loop (e.g. while true) around all code in `EVAL`.
871
872 * Modify each of the following form cases to add tail call recursion
873 support:
874 * `let*`: remove the final `EVAL` call on the second `ast` argument
875 (third list element). Set `env` (i.e. the local variable passed in
876 as second parameter of `EVAL`) to the new let environment. Set
877 `ast` (i.e. the local variable passed in as first parameter of
878 `EVAL`) to be the second `ast` argument. Continue at the beginning
879 of the loop (no return).
880 * `do`: change the `eval_ast` call to evaluate all the parameters
881 except for the last (2nd list element up to but not including
882 last). Set `ast` to the last element of `ast`. Continue
883 at the beginning of the loop (`env` stays unchanged).
884 * `if`: the condition continues to be evaluated, however, rather
885 than evaluating the true or false branch, `ast` is set to the
886 unevaluated value of the chosen branch. Continue at the beginning
887 of the loop (`env` is unchanged).
888
889 * The return value from the `fn*` special form will now become an
890 object/structure with attributes that allow the default invoke case
891 of `EVAL` to do TCO on mal functions. Those attributes are:
892 * `ast`: the second `ast` argument (third list element) representing
893 the body of the function.
894 * `params`: the first `ast` argument (second list element)
895 representing the parameter names of the function.
896 * `env`: the current value of the `env` parameter of `EVAL`.
897 * `fn`: the original function value (i.e. what was return by `fn*`
898 in step 4). Note that this is deferrable until step 9 when it is
899 needed for the `map` and `apply` core functions).
900
901 * The default "apply"/invoke case of `EVAL` must now be changed to
902 account for the new object/structure returned by the `fn*` form.
903 Continue to call `eval_ast` on `ast`. The first element is `f`.
904 Switch on the type of `f`:
905 * regular function (not one defined by `fn*`): apply/invoke it as
906 before (in step 4).
907 * a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate
908 a new environment using the `env` and `params` attributes of `f`
909 as the `outer` and `binds` arguments and rest `ast` arguments
910 (list elements 2 through the end) as the `exprs` argument. Set
911 `env` to the new environment. Continue at the beginning of the loop.
912
913 Run some manual tests from previous steps to make sure you have not
914 broken anything by adding TCO.
915
916 Now go to the top level, run the step 5 tests.
917
918 ```
919 make "test^quux^step5"
920 ```
921
922 Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to`
923 function cannot be tail call optimized because it does something after
924 the recursive call (`sum-to` calls itself and then does the addition).
925 Lispers say that the `sum-to` is not in tail position. The `sum2`
926 function however, calls itself from tail position. In other words, the
927 recursive call to `sum2` is the last action that `sum2` does. Calling
928 `sum-to` with a large value will cause a stack overflow exception in
929 most target languages (some have super-special tricks they use to
930 avoid stack overflows).
931
932 Congratulations, your mal implementation already has a feature (TCO)
933 that most mainstream languages lack.
934
935
936 <a name="step6"></a>
937
938 ### Step 6: Files, Mutation, and Evil
939
940 ![step6_file architecture](step6_file.png)
941
942 In step 5 you added tail call optimization. In this step you will add
943 some string and file operations and give your implementation a touch
944 of evil ... er, eval. And as long as your language supports function
945 closures, this step will be quite simple. However, to complete this
946 step, you must implement string type support, so if you have been
947 holding off on that you will need to go back and do so.
948
949 Compare the pseudocode for step 5 and step 6 to get a basic idea of
950 the changes that will be made during this step:
951 ```
952 diff -urp ../process/step5_tco.txt ../process/step6_file.txt
953 ```
954
955 * Copy `step5_tco.qx` to `step6_file.qx`.
956
957 * Add two new string functions to the core namespaces:
958 * `read-string`: this function just exposes the `read_str` function
959 from the reader. If your mal string type is not the same as your
960 target language (e.g. statically typed language) then your
961 `read-string` function will need to unbox (extract) the raw string
962 from the mal string type in order to call `read_str`.
963 * `slurp`: this function takes a file name (string) and returns the
964 contents of the file as a string. Once again, if your mal string
965 type wraps a raw target language string, then you will need to
966 unmarshall (extract) the string parameter to get the raw file name
967 string and marshall (wrap) the result back to a mal string type.
968
969 * In your main program, add a new symbol "eval" to your REPL
970 environment. The value of this new entry is a function that takes
971 a single argument `ast`. The closure calls the your `EVAL` function
972 using the `ast` as the first argument and the REPL environment
973 (closed over from outside) as the second argument. The result of
974 the `EVAL` call is returned. This simple but powerful addition
975 allows your program to treat mal data as a mal program. For example,
976 you can now to this:
977 ```
978 (def! mal-prog (list + 1 2))
979 (eval mal-prog)
980 ```
981
982 * Define a `load-file` function using mal itself. In your main
983 program call the `rep` function with this string:
984 "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \")\")))))".
985
986 Try out `load-file`:
987 * `(load-file "../tests/incA.mal")` -> `9`
988 * `(inc4 3)` -> `7`
989
990 The `load-file` function does the following:
991 * Call `slurp` to read in a file by name. Surround the contents with
992 "(do ...)" so that the whole file will be treated as a single
993 program AST (abstract syntax tree).
994 * Call `read-string` on the string returned from `slurp`. This uses
995 the reader to read/convert the file contents into mal data/AST.
996 * Call `eval` (the one in the REPL environment) on the AST returned
997 from `read-string` to "run" it.
998
999 Besides adding file and eval support, we'll add support for the atom data type
1000 in this step. An atom is the Mal way to represent *state*; it is
1001 heavily inspired by [Clojure's atoms](http://clojure.org/state). An atom holds
1002 a reference to a single Mal value of any type; it supports reading that Mal value
1003 and *modifying* the reference to point to another Mal value. Note that this is
1004 the only Mal data type that is mutable (but the Mal values it refers to are
1005 still immutable; immutability is explained in greater detail in step 7).
1006 You'll need to add 5 functions to the core namespace to support atoms:
1007
1008 * `atom`: Takes a Mal value and returns a new atom which points to that Mal value.
1009 * `atom?`: Takes an argument and returns `true` if the argument is an atom.
1010 * `deref`: Takes an atom argument and returns the Mal value referenced by this atom.
1011 * `reset!`: Takes an atom and a Mal value; the atom is modified to refer to
1012 the given Mal value. The Mal value is returned.
1013 * `swap!`: Takes an atom, a function, and zero or more function arguments. The
1014 atom's value is modified to the result of applying the function with the atom's
1015 value as the first argument and the optionally given function arguments as
1016 the rest of the arguments. The new atom's value is returned. (Side note: Mal is
1017 single-threaded, but in concurrent languages like Clojure, `swap!` promises
1018 atomic update: `(swap! myatom (fn* [x] (+ 1 x)))` will always increase the
1019 `myatom` counter by one and will not suffer from missing updates when the
1020 atom is updated from multiple threads.)
1021
1022 Optionally, you can add a reader macro `@` which will serve as a short form for
1023 `deref`, so that `@a` is equivalent to `(deref a)`. In order to do that, modify
1024 the conditional in reader `read_form` function and add a case which deals with
1025 the `@` token: if the token is `@` (at sign) then return a new list that
1026 contains the symbol `deref` and the result of reading the next form
1027 (`read_form`).
1028
1029 Now go to the top level, run the step 6 tests. The optional tests will
1030 need support from the reader for comments, vectors, hash-maps and the `@`
1031 reader macro:
1032 ```
1033 make "test^quux^step6"
1034 ```
1035
1036 Congratulations, you now have a full-fledged scripting language that
1037 can run other mal programs. The `slurp` function loads a file as
1038 a string, the `read-string` function calls the mal reader to turn that
1039 string into data, and the `eval` function takes data and evaluates it
1040 as a normal mal program. However, it is important to note that the
1041 `eval` function is not just for running external programs. Because mal
1042 programs are regular mal data structures, you can dynamically generate
1043 or manipulate those data structures before calling `eval` on them.
1044 This isomorphism (same shape) between data and programs is known as
1045 "homoiconicity". Lisp languages are homoiconic and this property
1046 distinguishes them from most other programming languages.
1047
1048 You mal implementation is quite powerful already but the set of
1049 functions that are available (from `core.qx`) is fairly limited. The
1050 bulk of the functions you will add are described in step 9 and step A,
1051 but you will begin to flesh them out over the next few steps to
1052 support quoting (step 7) and macros (step 8).
1053
1054
1055 #### Deferrable:
1056
1057 * Add the ability to run another mal program from the command line.
1058 Prior to the REPL loop, check if your mal implementation is called
1059 with command line arguments. If so, treat the first argument as
1060 a filename and use `rep` to call `load-file` on that filename, and
1061 finally exit/terminate execution.
1062
1063 * Add the rest of the command line arguments to your REPL environment
1064 so that programs that are run with `load-file` have access to their
1065 calling environment. Add a new "\*ARGV\*" (symbol) entry to your REPL
1066 environment. The value of this entry should be the rest of the
1067 command line arguments as a mal list value.
1068
1069
1070 <a name="step7"></a>
1071
1072 ### Step 7: Quoting
1073
1074 ![step7_quote architecture](step7_quote.png)
1075
1076 In step 7 you will add the special forms `quote` and `quasiquote` and
1077 add supporting core functions `cons` and `concat`. The two quote forms
1078 add a powerful abstraction for manipulating mal code itself
1079 (meta-programming).
1080
1081 The `quote` special form indicates to the evaluator (`EVAL`) that the
1082 parameter should not be evaluated (yet). At first glance, this might
1083 not seem particularly useful but an example of what this enables is the
1084 ability for a mal program to refer to a symbol itself rather than the
1085 value that it evaluates to. Likewise with lists. For example, consider
1086 the following:
1087
1088 * `(prn abc)`: this will lookup the symbol `abc` in the current
1089 evaluation environment and print it. This will result in error if
1090 `abc` is not defined.
1091 * `(prn (quote abc))`: this will print "abc" (prints the symbol
1092 itself). This will work regardless of whether `abc` is defined in
1093 the current environment.
1094 * `(prn (1 2 3))`: this will result in an error because `1` is not
1095 a function and cannot be applied to the arguments `(2 3)`.
1096 * `(prn (quote (1 2 3)))`: this will print "(1 2 3)".
1097 * `(def! l (quote (1 2 3)))`: list quoting allows us to define lists
1098 directly in the code (list literal). Another way of doing this is
1099 with the list function: `(def! l (list 1 2 3))`.
1100
1101 The second special quoting form is `quasiquote`. This allows a quoted
1102 list to have internal elements of the list that are temporarily
1103 unquoted (normal evaluation). There are two special forms that only
1104 mean something within a quasiquoted list: `unquote` and
1105 `splice-unquote`. These are perhaps best explained with some examples:
1106
1107 * `(def! lst (quote (2 3)))` -> `(2 3)`
1108 * `(quasiquote (1 (unquote lst)))` -> `(1 (2 3))`
1109 * `(quasiquote (1 (splice-unquote lst)))` -> `(1 2 3)`
1110
1111 The `unquote` form turns evaluation back on for its argument and the
1112 result of evaluation is put in place into the quasiquoted list. The
1113 `splice-unquote` also turns evaluation back on for its argument, but
1114 the evaluated value must be a list which is then "spliced" into the
1115 quasiquoted list. The true power of the quasiquote form will be
1116 manifest when it is used together with macros (in the next step).
1117
1118 Compare the pseudocode for step 6 and step 7 to get a basic idea of
1119 the changes that will be made during this step:
1120 ```
1121 diff -urp ../process/step6_file.txt ../process/step7_quote.txt
1122 ```
1123
1124 * Copy `step6_file.qx` to `step7_quote.qx`.
1125
1126 * Before implementing the quoting forms, you will need to implement
1127 * some supporting functions in the core namespace:
1128 * `cons`: this function takes a list as its second
1129 parameter and returns a new list that has the first argument
1130 prepended to it.
1131 * `concat`: this functions takes 0 or more lists as
1132 parameters and returns a new list that is a concatenation of all
1133 the list parameters.
1134
1135 An aside on immutability: note that neither cons or concat mutate
1136 their original list arguments. Any references to them (i.e. other
1137 lists that they may be "contained" in) will still refer to the
1138 original unchanged value. Mal, like Clojure, is a language which uses
1139 immutable data structures. I encourage you to read about the power and
1140 importance of immutability as implemented in Clojure (from which
1141 Mal borrows most of its syntax and feature-set).
1142
1143 * Add the `quote` special form. This form just returns its argument
1144 (the second list element of `ast`).
1145
1146 * Add the `quasiquote` special form. First implement a helper function
1147 `is_pair` that returns true if the parameter is a non-empty list.
1148 Then define a `quasiquote` function. This is called from `EVAL` with
1149 the first `ast` argument (second list element) and then `ast` is set
1150 to the result and execution continues at the top of the loop (TCO).
1151 The `quasiquote` function takes a parameter `ast` and has the
1152 following conditional:
1153 1. if `is_pair` of `ast` is false: return a new list containing:
1154 a symbol named "quote" and `ast`.
1155 2. else if the first element of `ast` is a symbol named "unquote":
1156 return the second element of `ast`.
1157 3. if `is_pair` of the first element of `ast` is true and the first
1158 element of first element of `ast` (`ast[0][0]`) is a symbol named
1159 "splice-unquote": return a new list containing: a symbol named
1160 "concat", the second element of first element of `ast`
1161 (`ast[0][1]`), and the result of calling `quasiquote` with the
1162 second through last element of `ast`.
1163 4. otherwise: return a new list containing: a symbol named "cons", the
1164 result of calling `quasiquote` on first element of `ast`
1165 (`ast[0]`), and the result of calling `quasiquote` with the second
1166 through last element of `ast`.
1167
1168
1169 Now go to the top level, run the step 7 tests:
1170 ```
1171 make "test^quux^step7"
1172 ```
1173
1174 Quoting is one of the more mundane functions available in mal, but do
1175 not let that discourage you. Your mal implementation is almost
1176 complete, and quoting sets the stage for the next very exiting step:
1177 macros.
1178
1179
1180 #### Deferrable
1181
1182 * The full names for the quoting forms are fairly verbose. Most Lisp
1183 languages have a short-hand syntax and Mal is no exception. These
1184 short-hand syntaxes are known as reader macros because they allow us
1185 to manipulate mal code during the reader phase. Macros that run
1186 during the eval phase are just called "macros" and are described in
1187 the next section. Expand the conditional with reader `read_form`
1188 function to add the following four cases:
1189 * token is "'" (single quote): return a new list that contains the
1190 symbol "quote" and the result of reading the next form
1191 (`read_form`).
1192 * token is "\`" (back-tick): return a new list that contains the
1193 symbol "quasiquote" and the result of reading the next form
1194 (`read_form`).
1195 * token is "~" (tilde): return a new list that contains the
1196 symbol "unquote" and the result of reading the next form
1197 (`read_form`).
1198 * token is "~@" (tilde + at sign): return a new list that contains
1199 the symbol "splice-unquote" and the result of reading the next
1200 form (`read_form`).
1201
1202 * Add support for quoting of vectors. The `is_pair` function should
1203 return true if the argument is a non-empty list or vector. `cons`
1204 should also accept a vector as the second argument. The return value
1205 is a list regardless. `concat` should support concatenation of
1206 lists, vectors, or a mix or both. The result is always a list.
1207
1208
1209 <a name="step8"></a>
1210
1211 ### Step 8: Macros
1212
1213 ![step8_macros architecture](step8_macros.png)
1214
1215 Your mal implementation is now ready for one of the most lispy and
1216 exciting of all programming concepts: macros. In the previous step,
1217 quoting enabled some simple manipulation data structures and therefore
1218 manipulation of mal code (because the `eval` function from step
1219 6 turns mal data into code). In this step you will be able to mark mal
1220 functions as macros which can manipulate mal code before it is
1221 evaluated. In other words, macros are user-defined special forms. Or
1222 to look at it another way, macros allow mal programs to redefine
1223 the mal language itself.
1224
1225 Compare the pseudocode for step 7 and step 8 to get a basic idea of
1226 the changes that will be made during this step:
1227 ```
1228 diff -urp ../process/step7_quote.txt ../process/step8_macros.txt
1229 ```
1230
1231 * Copy `step7_quote.qx` to `step8_macros.qx`.
1232
1233
1234 You might think that the infinite power of macros would require some
1235 sort of complex mechanism, but the implementation is actually fairly
1236 simple.
1237
1238 * Add a new attribute `is_macro` to mal function types. This should
1239 default to false.
1240
1241 * Add a new special form `defmacro!`. This is very similar to the
1242 `def!` form, but before the evaluated value (mal function) is set in
1243 the environment, the `is_macro` attribute should be set to true.
1244
1245 * Add a `is_macro_call` function: This function takes arguments `ast`
1246 and `env`. It returns true if `ast` is a list that contains a symbol
1247 as the first element and that symbol refers to a function in the
1248 `env` environment and that function has the `is_macro` attribute set
1249 to true. Otherwise, it returns false.
1250
1251 * Add a `macroexpand` function: This function takes arguments `ast`
1252 and `env`. It calls `is_macro_call` with `ast` and `env` and loops
1253 while that condition is true. Inside the loop, the first element of
1254 the `ast` list (a symbol), is looked up in the environment to get
1255 the macro function. This macro function is then called/applied with
1256 the rest of the `ast` elements (2nd through the last) as arguments.
1257 The return value of the macro call becomes the new value of `ast`.
1258 When the loop completes because `ast` no longer represents a macro
1259 call, the current value of `ast` is returned.
1260
1261 * In the evaluator (`EVAL`) before the special forms switch (apply
1262 section), perform macro expansion by calling the `macroexpand`
1263 function with the current value of `ast` and `env`. Set `ast` to the
1264 result of that call. If the new value of `ast` is no longer a list
1265 after macro expansion, then return the result of calling `eval_ast`
1266 on it, otherwise continue with the rest of the apply section
1267 (special forms switch).
1268
1269 * Add a new special form condition for `macroexpand`. Call the
1270 `macroexpand` function using the first `ast` argument (second list
1271 element) and `env`. Return the result. This special form allows
1272 a mal program to do explicit macro expansion without applying the
1273 result (which can be useful for debugging macro expansion).
1274
1275 Now go to the top level, run the step 8 tests:
1276 ```
1277 make "test^quux^step8"
1278 ```
1279
1280 There is a reasonably good chance that the macro tests will not pass
1281 the first time. Although the implementation of macros is fairly
1282 simple, debugging runtime bugs with macros can be fairly tricky. If
1283 you do run into subtle problems that are difficult to solve, let me
1284 recommend a couple of approaches:
1285
1286 * Use the macroexpand special form to eliminate one of the layers of
1287 indirection (to expand but skip evaluate). This will often reveal
1288 the source of the issue.
1289 * Add a debug print statement to the top of your main `eval` function
1290 (inside the TCO loop) to print the current value of `ast` (hint use
1291 `pr_str` to get easier to debug output). Pull up the step8
1292 implementation from another language and uncomment its `eval`
1293 function (yes, I give you permission to violate the rule this once).
1294 Run the two side-by-side. The first difference is likely to point to
1295 the bug.
1296
1297 Congratulations! You now have a Lisp interpreter with a super power
1298 that most non-Lisp languages can only dream of (I have it on good
1299 authority that languages dream when you are not using them). If you
1300 are not already familiar with Lisp macros, I suggest the following
1301 exercise: write a recursive macro that handles postfixed mal code
1302 (with the function as the last parameter instead of the first). Or
1303 not. I have not actually done so myself, but I have heard it is an
1304 interesting exercise.
1305
1306 In the next step you will add try/catch style exception handling to
1307 your implementation in addition to some new core functions. After
1308 step9 you will be very close to having a fully self-hosting mal
1309 implementation. Let us continue!
1310
1311
1312 #### Deferrable
1313
1314 * Add the following new core functions which are frequently used in
1315 macro functions:
1316 * `nth`: this function takes a list (or vector) and a number (index)
1317 as arguments, returns the element of the list at the given index.
1318 If the index is out of range, this function raises an exception.
1319 * `first`: this function takes a list (or vector) as its argument
1320 and return the first element. If the list (or vector) is empty or
1321 is `nil` then `nil` is returned.
1322 * `rest`: this function takes a list (or vector) as its argument and
1323 returns a new list containing all the elements except the first.
1324
1325 * In the main program, use the `rep` function to define two new
1326 control structures macros. Here are the string arguments for `rep`
1327 to define these macros:
1328 * `cond`: "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))"
1329 * `or`: "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) `(let* (or_FIXME ~(first xs)) (if or_FIXME or_FIXME (or ~@(rest xs))))))))"
1330
1331
1332 <a name="step9"></a>
1333
1334 ### Step 9: Try
1335
1336 ![step9_try architecture](step9_try.png)
1337
1338 In this step you will implement the final mal special form for
1339 error/exception handling: `try*/catch*`. You will also add several core
1340 functions to your implementation. In particular, you will enhance the
1341 functional programming pedigree of your implementation by adding the
1342 `apply` and `map` core functions.
1343
1344 Compare the pseudocode for step 8 and step 9 to get a basic idea of
1345 the changes that will be made during this step:
1346 ```
1347 diff -urp ../process/step8_macros.txt ../process/step9_try.txt
1348 ```
1349
1350 * Copy `step8_macros.qx` to `step9_try.qx`.
1351
1352 * Add the `try*/catch*` special form to the EVAL function. The
1353 try catch form looks like this: `(try* A (catch* B C))`. The form
1354 `A` is evaluated, if it throws an exception, then form `C` is
1355 evaluated with a new environment that binds the symbol `B` to the
1356 value of the exception that was thrown.
1357 * If your target language has built-in try/catch style exception
1358 handling then you are already 90% of the way done. Add a
1359 (native language) try/catch block that evaluates `A` within
1360 the try block and catches all exceptions. If an exception is
1361 caught, then translate it to a mal type/value. For native
1362 exceptions this is either the message string or a mal hash-map
1363 that contains the message string and other attributes of the
1364 exception. When a regular mal type/value is used as an
1365 exception, you will probably need to store it within a native
1366 exception type in order to be able to convey/transport it using
1367 the native try/catch mechanism. Then you will extract the mal
1368 type/value from the native exception. Create a new mal environment
1369 that binds `B` to the value of the exception. Finally, evaluate `C`
1370 using that new environment.
1371 * If your target language does not have built-in try/catch style
1372 exception handling then you have some extra work to do. One of the
1373 most straightforward approaches is to create a a global error
1374 variable that stores the thrown mal type/value. The complication
1375 is that there are a bunch of places where you must check to see if
1376 the global error state is set and return without proceeding. The
1377 rule of thumb is that this check should happen at the top of your
1378 EVAL function and also right after any call to EVAL (and after any
1379 function call that might happen to call EVAL further down the
1380 chain). Yes, it is ugly, but you were warned in the section on
1381 picking a language.
1382
1383 * Add the `throw` core function.
1384 * If your language supports try/catch style exception handling, then
1385 this function takes a mal type/value and throws/raises it as an
1386 exception. In order to do this, you may need to create a custom
1387 exception object that wraps a mal value/type.
1388 * If your language does not support try/catch style exception
1389 handling, then set the global error state to the mal type/value.
1390
1391 * Add the `apply` and `map` core functions. In step 5, if you did not
1392 add the original function (`fn`) to the structure returned from
1393 `fn*`, the you will need to do so now.
1394 * `apply`: takes at least two arguments. The first argument is
1395 a function and the last argument is list (or vector). The
1396 arguments between the function and the last argument (if there are
1397 any) are concatenated with the final argument to create the
1398 arguments that are used to call the function. The apply
1399 function allows a function to be called with arguments that are
1400 contained in a list (or vector). In other words, `(apply F A B [C
1401 D])` is equivalent to `(F A B C D)`.
1402 * `map`: takes a function and a list (or vector) and evaluates the
1403 function against every element of the list (or vector) one at
1404 a time and returns the results as a list.
1405
1406 * Add some type predicates core functions. In Lisp, predicates are
1407 functions that return true/false (or true value/nil) and typically
1408 end in "?" or "p".
1409 * `nil?`: takes a single argument and returns true (mal true value)
1410 if the argument is nil (mal nil value).
1411 * `true?`: takes a single argument and returns true (mal true value)
1412 if the argument is a true value (mal true value).
1413 * `false?`: takes a single argument and returns true (mal true
1414 value) if the argument is a false value (mal false value).
1415 * `symbol?`: takes a single argument and returns true (mal true
1416 value) if the argument is a symbol (mal symbol value).
1417
1418 Now go to the top level, run the step 9 tests:
1419 ```
1420 make "test^quux^step9"
1421 ```
1422
1423 Your mal implementation is now essentially a fully featured Lisp
1424 interpreter. But if you stop now you will miss one of the most
1425 satisfying and enlightening aspects of creating a mal implementation:
1426 self-hosting.
1427
1428 #### Deferrable
1429
1430 * Add the following new core functions:
1431 * `symbol`: takes a string and returns a new symbol with the string
1432 as its name.
1433 * `keyword`: takes a string and returns a keyword with the same name
1434 (usually just be prepending the special keyword
1435 unicode symbol). This function should also detect if the argument
1436 is already a keyword and just return it.
1437 * `keyword?`: takes a single argument and returns true (mal true
1438 value) if the argument is a keyword, otherwise returns false (mal
1439 false value).
1440 * `vector`: takes a variable number of arguments and returns
1441 a vector containing those arguments.
1442 * `vector?`: takes a single argument and returns true (mal true
1443 value) if the argument is a vector, otherwise returns false (mal
1444 false value).
1445 * `hash-map`: takes a variable but even number of arguments and
1446 returns a new mal hash-map value with keys from the odd arguments
1447 and values from the even arguments respectively. This is basically
1448 the functional form of the `{}` reader literal syntax.
1449 * `map?`: takes a single argument and returns true (mal true
1450 value) if the argument is a hash-map, otherwise returns false (mal
1451 false value).
1452 * `assoc`: takes a hash-map as the first argument and the remaining
1453 arguments are odd/even key/value pairs to "associate" (merge) into
1454 the hash-map. Note that the original hash-map is unchanged
1455 (remember, mal values are immutable), and a new hash-map
1456 containing the old hash-maps key/values plus the merged key/value
1457 arguments is returned.
1458 * `dissoc`: takes a hash-map and a list of keys to remove from the
1459 hash-map. Again, note that the original hash-map is unchanged and
1460 a new hash-map with the keys removed is returned. Key arguments
1461 that do not exist in the hash-map are ignored.
1462 * `get`: takes a hash-map and a key and returns the value of looking
1463 up that key in the hash-map. If the key is not found in the
1464 hash-map then nil is returned.
1465 * `contains?`: takes a hash-map and a key and returns true (mal true
1466 value) if the key exists in the hash-map and false (mal false
1467 value) otherwise.
1468 * `keys`: takes a hash-map and returns a list (mal list value) of
1469 all the keys in the hash-map.
1470 * `vals`: takes a hash-map and returns a list (mal list value) of
1471 all the values in the hash-map.
1472 * `sequential?`: takes a single arguments and returns true (mal true
1473 value) if it is a list or a vector, otherwise returns false (mal
1474 false value).
1475
1476
1477 <a name="stepA"></a>
1478
1479 ### Step A: Metadata, Self-hosting and Interop
1480
1481 ![stepA_mal architecture](stepA_mal.png)
1482
1483 You have reached the final step of your mal implementation. This step
1484 is kind of a catchall for things that did not fit into other steps.
1485 But most importantly, the changes you make in this step will unlock
1486 the magical power known as "self-hosting". You might have noticed
1487 that one of the languages that mal is implemented in is "mal". Any mal
1488 implementation that is complete enough can run the mal implementation
1489 of mal. You might need to pull out your hammock and ponder this for
1490 a while if you have never built a compiler or interpreter before. Look
1491 at the step source files for the mal implementation of mal (it is not
1492 cheating now that you have reached step A).
1493
1494 If you deferred the implementation of keywords, vectors and hash-maps,
1495 now is the time to go back and implement them if you want your
1496 implementation to self-host.
1497
1498 Compare the pseudocode for step 9 and step A to get a basic idea of
1499 the changes that will be made during this step:
1500 ```
1501 diff -urp ../process/step9_try.txt ../process/stepA_mal.txt
1502 ```
1503
1504 * Copy `step9_try.qx` to `stepA_mal.qx`.
1505
1506 * Add the `readline` core function. This functions takes a
1507 string that is used to prompt the user for input. The line of text
1508 entered by the user is returned as a string. If the user sends an
1509 end-of-file (usually Ctrl-D), then nil is returned.
1510
1511 * Add meta-data support to mal functions. TODO. Should be separate
1512 from the function macro flag.
1513
1514 * Add a new "\*host-language\*" (symbol) entry to your REPL
1515 environment. The value of this entry should be a mal string
1516 containing thename of the current implementation.
1517
1518 * When the REPL starts up (as opposed to when it is called with
1519 a script and/or arguments), call the `rep` function with this string
1520 to print a startup header:
1521 "(println (str \"Mal [\" *host-language* \"]\"))".
1522
1523
1524 Now go to the top level, run the step A tests:
1525 ```
1526 make "test^quux^stepA"
1527 ```
1528
1529 Once you have passed all the non-optional step A tests, it is time to
1530 try self-hosting. Run your step A implementation as normal, but use
1531 the file argument mode you added in step 6 to run a each of the step
1532 from the mal implementation:
1533 ```
1534 ./stepA_mal.qx ../mal/step1_read_print.mal
1535 ./stepA_mal.qx ../mal/step2_eval.mal
1536 ...
1537 ./stepA_mal.qx ../mal/step9_try.mal
1538 ./stepA_mal.qx ../mal/stepA_mal.mal
1539 ```
1540
1541 There is a very good chance that you will encounter an error at some
1542 point while trying to run the mal in mal implementation steps above.
1543 Debugging failures that happen while self-hosting is MUCH more
1544 difficult and mind bending. One of the best approaches I have
1545 personally found is to add prn statements to the mal implementation
1546 step (not your own implementation of mal) that is causing problems.
1547
1548 Another approach I have frequently used is to pull out the code from
1549 the mal implementation that is causing the problem and simplify it
1550 step by step until you have a simple piece of mal code that still
1551 reproduces the problem. Once the reproducer is simple enough you will
1552 probably know where in your own implementation that problem is likely
1553 to be. Please add your simple reproducer as a test case so that future
1554 implementers will fix similar issues in their code before they get to
1555 self-hosting when it is much more difficult to track down and fix.
1556
1557 Once you can manually run all the self-hosted steps, it is time to run
1558 all the tests in self-hosted mode:
1559 ```
1560 make MAL_IMPL=quux "test^mal"
1561 ```
1562
1563 When you run into problems (which you almost certainly will), use the
1564 same process described above to debug them.
1565
1566 Congratulations!!! When all the tests pass, you should pause for
1567 a moment and consider what you have accomplished. You have implemented
1568 a Lisp interpreter that is powerful and complete enough to run a large
1569 mal program which is itself an implementation of the mal language. You
1570 might even be asking if you can continue the "inception" by using your
1571 implementation to run a mal implementation which itself runs the mal
1572 implementation.
1573
1574
1575 #### Optional: gensym
1576
1577 The `or` macro we introduced at step 8 has a bug. It defines a
1578 variable called `or_FIXME`, which "shadows" such a binding from the
1579 user's code (which uses the macro). If a user has a variable called
1580 `or_FIXME`, it cannot be used as an `or` macro argument. In order to
1581 fix that, we'll introduce `gensym`: a function which returns a symbol
1582 which was never used before anywhere in the program. This is also an
1583 example for the use of mal atoms to keep state (the state here being
1584 the number of symbols produced by `gensym` so far).
1585
1586 Previously you used `rep` to define the `or` macro. Remove that
1587 definition and use `rep` to define the new counter, `gensym` function
1588 and the clean `or` macro. Here are the string arguments you need to
1589 pass to `rep`:
1590 ```
1591 "(def! *gensym-counter* (atom 0))"
1592
1593 "(def! gensym (fn* [] (symbol (str \"G__\" (swap! *gensym-counter* (fn* [x] (+ 1 x)))))))"
1594
1595 "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) (let* (condvar (gensym)) `(let* (~condvar ~(first xs)) (if ~condvar ~condvar (or ~@(rest xs)))))))))"
1596 ```
1597
1598 For extra information read [Peter Seibel's thorough discussion about
1599 `gensym` and leaking macros in Common Lisp](http://www.gigamonkeys.com/book/macros-defining-your-own.html#plugging-the-leaks).
1600
1601
1602 #### Optional additions
1603
1604 * Add metadata support to composite data types, symbols and native
1605 functions. TODO
1606 * Add the following new core functions:
1607 * `time-ms`: takes no arguments and returns the number of
1608 milliseconds since epoch (00:00:00 UTC January 1, 1970), or, if
1609 not possible, since another point in time (`time-ms` is usually
1610 used relatively to measure time durations). After `time-ms` is
1611 implemented, you can run the mal implementation performance
1612 benchmarks by running `make perf^quux`.
1613 * `conj`: takes a collection and one or more elements as arguments
1614 and returns a new collection which includes the original
1615 collection and the new elements. If the collection is a list, a
1616 new list is returned with the elements inserted at the start of
1617 the given list in opposite order; if the collection is a vector, a
1618 new vector is returned with the elements added to the end of the
1619 given vector.
1620 * `string?`: returns true if the parameter is a string.
1621 * `seq`: takes a list, vector, string, or nil. If an empty list,
1622 empty vector, or empty string ("") is passed in then nil is
1623 returned. Otherwise, a list is returned unchanged, a vector is
1624 converted into a list, and a string is converted to a list that
1625 containing the original string split into single character
1626 strings.
1627 * For interop with the target language, add this core function:
1628 * `quux-eval`: takes a string, evaluates it in the target language,
1629 and returns the result converted to the relevant Mal type. You
1630 may also add other interop functions as you see fit; Clojure, for
1631 example, has a function called `.` which allows calling Java
1632 methods. If the target language is a static language, consider
1633 using FFI or some language-specific reflection mechanism, if
1634 available. The tests for `quux-eval` and any other interop
1635 function should be added in `quux/tests/stepA_mal.mal` (see the
1636 [tests for `lua-eval`](../lua/tests/stepA_mal.mal) as an example).
1637
1638
1639 ## TODO:
1640
1641 * simplify: "X argument (list element Y)" -> ast[Y]
1642 * list of types with metadata: list, vector, hash-map, mal functions
1643 * more clarity about when to peek and poke in read_list and read_form
1644 * tokenizer: use first group rather than whole match (to eliminate
1645 whitespace/commas)