process/guide.md

   1 # The Make-A-Lisp Process
   2
   3 So you want to write a Lisp interpreter? Welcome!
   4
   5 The goal of the Make-A-Lisp project is to make it easy to write your
   6 own Lisp interpreter without sacrificing those many "Aha!" moments
   7 that come from ascending the McCarthy mountain. When you reach the peak
   8 of this particular mountain, you will have an interpreter for the mal
   9 Lisp language that is powerful enough to be self-hosting, meaning it
  10 will be able to run a mal interpreter written in mal itself.
  11
  12 So jump right in (er ... start the climb)!
  13
  14 ## Pick a language
  15
  16 You might already have a language in mind that you want to use.
  17 Technically speaking, mal can be implemented in any sufficiently
  18 complete programming language (i.e. Turing complete), however, there are a few
  19 language features that can make the task MUCH easier. Here are some of
  20 them in rough order of importance:
  21
  22 * A sequential compound data structure (e.g. arrays, lists,
  23   vectors, etc)
  24 * An associative compound data structure (e.g. a dictionary,
  25   hash-map, associative array, etc)
  26 * Function references (first class functions, function pointers,
  27   etc)
  28 * Real exception handling (try/catch, raise, throw, etc)
  29 * Variable argument functions (variadic, var args, splats, apply, etc)
  30 * Function closures
  31 * PCRE regular expressions
  32
  33 In addition, the following will make your task especially easy:
  34
  35 * Dynamic typing / boxed types (specifically, the ability to store
  36   different data types in the sequential and associative structures
  37   and the language keeps track of the type for you)
  38 * Compound data types support arbitrary runtime "hidden" data
  39   (metadata, metatables, dynamic fields attributes)
  40
  41 Here are some examples of languages that have all of the above
  42 features: JavaScript, Ruby, Python, Lua, R, Clojure.
  43
  44 Michael Fogus has some great blog posts on interesting but less well
  45 known languages and many of the languages on his lists do not yet have
  46 any mal implementations:
  47 * http://blog.fogus.me/2011/08/14/perlis-languages/
  48 * http://blog.fogus.me/2011/10/18/programming-language-development-the-past-5-years/
  49
  50 Many of the most popular languages already have Mal implementations.
  51 However, this should not discourage you from creating your own
  52 implementation in a language that already has one. However, if you go
  53 this route, I suggest you avoid referring to the existing
  54 implementations (i.e. "cheating") to maximize your learning experience
  55 instead of just borrowing mine. On the other hand, if your goal is to
  56 add new implementations to mal as efficiently as possible, then you
  57 SHOULD find the most similar target language implementation and refer
  58 to it frequently.
  59
  60 If you want a fairly long list of programming languages with an
  61 approximate measure of popularity, try the [Programming Language
  62 Popularity Chart](http://langpop.corger.nl/)
  63
  64
  65 ## Getting started
  66
  67 * Install your chosen language interpreter/compiler, language package
  68   manager and build tools (if applicable)
  69
  70 * Fork the mal repository on github and then clone your forked
  71   repository:
  72 ```
  73 git clone git@github.com:YOUR_NAME/mal.git
  74 cd mal
  75 ```
  76
  77 * Make a new directory for your implementation. For example, if your
  78 language is called "quux":
  79 ```
  80 mkdir quux
  81 ```
  82
  83 * Modify the top level Makefile to allow the tests to be run against
  84   your implementation. For example, if your language is named "quux"
  85   and uses "qx" as the file extension, then make the following
  86   3 modifications to Makefile:
  87 ```
  88 IMPLS = ... quux ...
  89 ...
  90 quux_STEP_TO_PROG = mylang/$($(1)).qx
  91 ...
  92 quux_RUNSTEP =  ../$(2) $(3)
  93 ```
  94
  95 This allows you to run tests against your implementation like this:
  96 ```
  97 make "test^quux^stepX"
  98 ```
  99
 100
 101 ## General hints
 102
 103 Stackoverflow and Google are your best friends. Modern polyglot
 104 developers do not memorize dozens of programming languages. Instead,
 105 they learn the peculiar terminology used with each language and then
 106 use this to search for their answers.
 107
 108 Here are some other resources where multiple languages are
 109 compared/described:
 110 * http://learnxinyminutes.com/
 111 * http://hyperpolyglot.org/
 112 * http://rosettacode.org/
 113 * http://rigaux.org/language-study/syntax-across-languages/
 114
 115 Do not let yourself be bogged down by specific problems. While the
 116 make-a-lisp process is structured as a series of steps, the reality is
 117 that building a lisp interpreter is more like a branching tree. If you
 118 get stuck on tail call optimization, or hash-maps, move on to other
 119 things. You will often have a stroke of inspiration for a problem as
 120 you work through other functionality. I have tried to structure this
 121 guide and the tests to make clear which things can be deferred until
 122 later.
 123
 124 An aside on deferrable/optional bits: when you run the tests for
 125 a given step, the last tests are often marked with an "optional"
 126 header. This indicates that these are tests for functionality that is
 127 not critical to finish a basic mal implementation. Many of the steps
 128 in this process guide have a "Deferrable" section, however, it is not
 129 quite the same meaning. Those sections include the functionality that
 130 is marked as optional in the tests, but they also include
 131 functionality that becomes mandatory at a later step. In other words,
 132 this is a "make your own Lisp adventure".
 133
 134 Use test driven development. Each step of the make-a-lisp process has
 135 a bunch of tests associated with it and there is an easy script to run
 136 all the tests for a specific step in the process. Pick a failing test,
 137 fix it, repeat until all the tests for that step pass.
 138
 139 The `process` directory contains abbreviated pseudocode and
 140 architecture images for each step of the make-a-lisp process. Use
 141 a textual diff/comparison tool to compare the previous pseudocode step
 142 with the one you are working on. The architecture images have changes
 143 from the previous step highlighted in red.
 144
 145 If you get completely stuck and are feeling like giving up, then you
 146 should "cheat" by referring to the same step or functionality in
 147 a existing implementation language. You are here to learn, not to take
 148 a test, so do not feel bad about it. Okay, you should feel a little
 149 bit bad about it.
 150
 151
 152 ## The Make-A-Lisp Process
 153
 154 In the steps that follow the name of the target language is "quux" and
 155 the file extension for that language is "qx".
 156
 157
 158 <a name="step0"></a>
 159
 160 ### Step 0: The REPL
 161
 162 ![step0_repl architecture](step0_repl.png)
 163
 164 This step is basically just creating a skeleton of your interpreter.
 165
 166 * Create a `step0_repl.qx` file in `quux/`.
 167
 168 * Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep`
 169   (read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just
 170   stubs that return their first parameter (a string if your target
 171   language is a statically typed) and `rep` calls them in order
 172   passing the return to the input of the next.
 173
 174 * Add a main loop that repeatedly prints a prompt (needs to be
 175   "user> " for later tests to pass), gets a line of input from the
 176   user, calls `rep` with that line of input, and then prints out the
 177   result from `rep`. It should also exit when you send it an EOF
 178   (often Ctrl-D).
 179
 180 * If you are using a compiled (ahead-of-time rather than just-in-time)
 181   language, then create a Makefile (or appropriate project definition
 182   file) in your directory.
 183
 184 It is time to run your first tests. This will check that your program
 185 does input and output in a way that can be captured by the test
 186 harness. Go to the top level and run the following:
 187 ```
 188 make "test^quux^step0"
 189 ```
 190
 191 Add and then commit your new `step0_repl.qx` and `Makefile` to git.
 192
 193 Congratulations! You have just completed the first step of the
 194 make-a-lisp process.
 195
 196
 197 #### Optional:
 198
 199 * Add full line editing and command history support to your
 200   interpreter REPL. Many languages have a library/module that provide
 201   line editing support. Another option if your language supports it is
 202   to use an FFI (foreign function interface) to load and call directly
 203   into GNU readline, editline, or linenoise library. Add line
 204   editing interface code to `readline.qx`
 205
 206
 207 <a name="step1"></a>
 208
 209 ### Step 1: Read and Print
 210
 211 ![step1_read_print architecture](step1_read_print.png)
 212
 213 In this step, your interpreter will "read" the string from the user
 214 and parse it into an internal tree data structure (an abstract syntax
 215 tree) and then take that data structure and "print" it back to
 216 a string.
 217
 218 In non-lisp languages, this step (called "lexing and parsing") can be
 219 one of the most complicated parts of the compiler/interpreter. In
 220 Lisp, the data structure that you want in memory is basically
 221 represented directly in the code that the programmer writes
 222 (homoiconicity).
 223
 224 For example, if the string is "(+ 2 (* 3 4))" then the read function
 225 will process this into a tree structure that looks like this:
 226 ```
 227           List
 228          / |  \
 229         /  |   \
 230        /   |    \
 231   Sym:+  Int:2  List
 232                / |  \
 233               /  |   \
 234              /   |    \
 235          Sym:*  Int:3  Int:4
 236 ```
 237
 238 Each left paren and its matching right paren (lisp "sexpr") becomes
 239 a node in the tree and everything else becomes a leaf in the tree.
 240
 241 If you can find code for an implementation of a JSON encoder/decoder
 242 in your target language then you can probably just borrow and modify
 243 that and be 75% of the way done with this step.
 244
 245 The rest of this section is going to assume that you are not starting
 246 from an existing JSON encoder/decoder, but that you do have access to
 247 a Perl compatible regular expressions (PCRE) module/library. You can
 248 certainly implement the reader using simple string operations, but it
 249 is more involved. The `make`, `ps` (postscript) and Haskell
 250 implementations have examples of a reader/parser without using regular
 251 expression support.
 252
 253 * Copy `step0_repl.qx` to `step1_read_print.qx`.
 254
 255 * Add a `reader.qx` file to hold functions related to the reader.
 256
 257 * If the target language has objects types (OOP), then the next step
 258   is to create a simple stateful Reader object in `reader.qx`. This
 259   object will store the tokens and a position. The Reader object will
 260   have two methods: `next` and `peek`. `next` returns the tokens at
 261   the current position and increments the position. `peek` just
 262   returns the token at the current position.
 263
 264 * Add a function `read_str` in `reader.qx`. This function
 265   will call `tokenizer` and then create a new Reader object instance
 266   with the tokens. Then it will call `read_form` with the Reader
 267   instance.
 268
 269 * Add a function `tokenizer` in `reader.qx`. This function will take
 270   a single string and return an array/list
 271   of all the tokens (strings) in it. The following regular expression
 272   (PCRE) will match all mal tokens.
 273 ```
 274 [\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"|;.*|[^\s\[\]{}('"`,;)]*)
 275 ```
 276
 277 * Add the function `read_form` to `reader.qx`. This function
 278   will peek at the first token in the Reader object and switch on the
 279   first character of that token. If the character is a left paren then
 280   `read_list` is called with the Reader object. Otherwise, `read_atom`
 281   is called with the Reader Object. The return value from `read_form`
 282   is a mal data type. If your target language is statically typed then
 283   you will need some way for `read_form` to return a variant or
 284   subclass type. For example, if your language is object oriented,
 285   then you can define a top level MalType (in `types.qx`) that all
 286   your mal data types inherit from. The MalList type (which also
 287   inherits from MalType) will contains a list/array of other MalTypes.
 288   If your language is dynamically typed then you can likely just
 289   return a plain list/array of other mal types.
 290
 291 * Add the function `read_list` to `reader.qx`. This function will
 292   repeatedly call `read_form` with the Reader object until it
 293   encounters a ')' token (if it reach EOF before reading a ')' then
 294   that is an error). It accumulates the results into a List type.  If
 295   your language does not have a sequential data type that can hold mal
 296   type values you may need to implement one (in `types.qx`).  Note
 297   that `read_list` repeatedly calls `read_form` rather than
 298   `read_atom`. This mutually recursive defintion between `read_list`
 299   and `read_form` is what allows lists to contain lists.
 300
 301 * Add the function `read_atom` to `reader.qx`. This function will
 302   look at the contents of the token and return the appropriate scalar
 303   (simple/single) data type value. Initially, you can just implement
 304   numbers (integers) and symbols . This will allow you to proceed
 305   through the next couple of steps before you will need to implement
 306   the other fundamental mal types: nil, true, false, and string. The
 307   remaining mal types: keyword, vector, hash-map, and atom do not
 308   need to be implemented until step 9 (but can be implemented at any
 309   point between this step and that). BTW, symbols types are just an
 310   object that contains a single string name value (some languages have
 311   symbol types already).
 312
 313 * Add a file `printer.qx`. This file will contain a single function
 314   `pr_str` which does the opposite of `read_str`: take a mal data
 315   structure and return a string representation of it. But `pr_str` is
 316   much simpler and is basically just a switch statement on the type of
 317   the input object:
 318
 319   * symbol: return the string name of the symbol
 320   * number: return the number as a string
 321   * list: iterate through each element of the list calling `pr_str` on
 322     it, then join the results with a space separator, and surround the
 323     final result with parens
 324
 325 * Change the `READ` function in `step1_read_print.qx` to call
 326   `reader.read_str` and the `PRINT` function to call `printer.pr_str`.
 327   `EVAL` continues to simply return its input but the type is now
 328   a mal data type.
 329
 330 You now have enough hooked up to begin testing your code. You can
 331 manually try some simple inputs:
 332   * `123` -> `123`
 333   * `   123  ` -> `123`
 334   * `abc` -> `abc`
 335   * `   abc   ` -> `abc`
 336   * `(123 456)` -> `(123 456)`
 337   * `(  123   456 789   )   ` -> `(123 456 789)`
 338   * `(  + 2   (*  3  4)  )  ` -> `(+ 2 (* 3 4))`
 339
 340 To verify that your code is doing more than just eliminating extra
 341 spaces (and not failing), you can instrument your `reader.qx` functions.
 342
 343 Once you have gotten past those simple manual tests, it is time to run
 344 the full suite of step 1 tests. Go to the top level and run the
 345 following:
 346 ```
 347 make "test^quux^step1"
 348 ```
 349
 350 Fix any test failures related to symbols, numbers and lists.
 351
 352 Depending on the functionality of your target language, it is likely
 353 that you have now just completed one of the most difficult steps. It
 354 is down hill from here. The remaining steps will probably be easier
 355 and each step will give progressively more bang for the buck.
 356
 357 #### Deferrable:
 358
 359
 360 * Add error checking to your reader functions to make sure parens
 361   are properly matched. Catch and print these errors in your main
 362   loop. If your language does not have try/catch style bubble up
 363   exception handling, then you will need to add explicit error
 364   handling to your code to catch and pass on errors without crashing.
 365
 366 * Add support for the other basic data type to your reader and printer
 367   functions: string, nil, true, and false. These become mandatory at
 368   step 4. When a string is read, the following transformations are
 369   applied: a backslash followed by a doublequote is translated into
 370   a plain doublequote character, a backslash followed by "n" is
 371   translated into a newline, and a backslash followed by another
 372   backslash is translated into a single backslash. To properly print
 373   a string (for step 4 string functions), the `pr_str` function needs
 374   another parameter called `print_readably`.  When `print_readably` is
 375   true, doublequotes, newlines, and backslashes are translated into
 376   their printed representations (the reverse of the reader). The
 377   `PRINT` function in the main program should call `pr_str` with
 378   print_readably set to true.
 379
 380 * Add support for the other mal types: keyword, vector, hash-map, and
 381   atom.
 382   * keyword: a keyword is a token that begins with a colon. A keyword
 383     can just be stored as a string with special unicode prefix like
 384     0x29E (or char 0xff/127 if the target language does not have good
 385     unicode support) and the printer translates strings with that
 386     prefix back to the keyword representation. This makes it easy to
 387     use keywords as hash map keys in most languages. You can also
 388     store keywords as a unique data type, but you will need to make
 389     sure they can be used as hash map keys (which may involve doing
 390     a similar prefixed translation anyways).
 391   * vector: a vector can be implemented with same underlying
 392     type as a list as long as there is some mechanism to keep track of
 393     the difference. You can use the same reader function for both
 394     lists and vectors by adding parameters for the starting and ending
 395     tokens.
 396   * hash-map: a hash-map is an associative data structure that maps
 397     strings to other mal values. If you implement keywords as prefixed
 398     strings, then you only need a native associative data structure
 399     which supports string keys. Clojure allows any value to be a hash
 400     map key, but the base functionality in mal is to support strings
 401     and keyword keys. Because of the representation of hash-maps as
 402     an alternating sequence of keys and values, you can probably use
 403     the same reader function for hash-maps as lists and vectors with
 404     parameters to indicate the starting and ending tokens. The odd
 405     tokens are then used for keys with the corresponding even tokens
 406     as the values.
 407
 408 * Add support for reader macros which are forms that are
 409   transformed into other forms during the read phase. Refer to
 410   `tests/step1_read_print.mal` for the form that these macros should
 411   take (they are just simple transformations of the token stream).
 412
 413 * Add comment support to your reader. The tokenizer should ignore
 414   tokens that start with ";". Your `read_str` function will need to
 415   properly handle when the tokenizer returns no values. The simplest
 416   way to do this is to return `nil` mal value. A cleaner option (that
 417   does not print `nil` at the prompt is to throw a special exception
 418   that causes the main loop to simply continue at the beginning of the
 419   loop without calling `rep`.
 420
 421
 422 <a name="step2"></a>
 423
 424 ### Step 2: Eval
 425
 426 ![step2_eval architecture](step2_eval.png)
 427
 428 In step 1 your mal interpreter was basically just a way to validate
 429 input and eliminate extraneous white space. In this step you will turn
 430 your interpreter into a simple number calculator by adding
 431 functionality to the evaluator (`EVAL`).
 432
 433 Compare the pseudocode for step 1 and step 2 to get a basic idea of
 434 the changes that will be made during this step:
 435 ```
 436 diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt
 437 ```
 438
 439 * Copy `step1_read_print.qx` to `step2_eval.qx`.
 440
 441 * Define a simple initial REPL environment. This environment is an
 442   associative structure that maps symbols (or symbol names) to
 443   numeric functions. For example, in python this would look something
 444   like this:
 445 ```
 446 repl_env = {'+': lambda a,b: a+b,
 447             '-': lambda a,b: a-b,
 448             '*': lambda a,b: a*b,
 449             '/': lambda a,b: int(a/b)}
 450 ```
 451
 452 * Modify the `rep` function to pass the REPL environment as the second
 453   parameter for the `EVAL` call.
 454
 455 * Create a new function `eval_ast` which takes `ast` (mal data type)
 456   and an associative structure (the environment from above).
 457   `eval_ast` switches on the type of `ast` as follows:
 458
 459   * symbol: lookup the symbol in the environment structure and return
 460     the value or raise an error no value is found
 461   * list: return a new list that is the result of calling `EVAL` on
 462     each of the members of the list
 463   * otherwise just return the original `ast` value
 464
 465 * Modify `EVAL` to check if the first parameter `ast` is a list.
 466   * `ast` is not a list: then return the result of calling `eval_ast`
 467     on it.
 468   * `ast` is a list: call `eval_ast` to get a new evaluated list. Take
 469     the first item of the evaluated list and call it as function using
 470     the rest of the evaluated list as its arguments.
 471
 472 If your target language does not have full variable length argument
 473 support (e.g. variadic, vararg, splats, apply) then you will need to
 474 pass the full list of arguments as a single parameter and split apart
 475 the individual values inside of every mal function. This is annoying,
 476 but workable.
 477
 478 The process of taking a list and invoking or executing it to return
 479 something new is known in Lisp as the "apply" phase.
 480
 481 Try some simple expressions:
 482
 483   * `(+ 2 3)` -> `5`
 484   * `(+ 2 (* 3 4))` -> `14`
 485
 486 The most likely challenge you will encounter is how to properly call
 487 a function references using an arguments list.
 488
 489 Now go to the top level, run the step 2 tests and fix the errors.
 490 ```
 491 make "test^quux^step2"
 492 ```
 493
 494 You now have a simple prefix notation calculator!
 495
 496
 497 <a name="step3"></a>
 498
 499 ### Step 3: Environments
 500
 501 ![step3_env architecture](step3_env.png)
 502
 503 In step 2 you were already introduced to REPL environment (`repl_env`)
 504 where the basic numeric functions were stored and looked up. In this
 505 step you will add the ability to create new environments (`let*`) and
 506 modify existing environments (`def!`).
 507
 508 A Lisp environment is an associative data structure that maps symbols (the
 509 keys) to values. But Lisp environments have an additional important
 510 function: they can refer to another environment (the outer
 511 environment). During environment lookups, if the current environment
 512 does not have the symbol, the lookup continues in the outer
 513 environment, and continues this way until the symbol is either found,
 514 or the outer environment is `nil` (the outermost environment in the
 515 chain).
 516
 517 Compare the pseudocode for step 2 and step 3 to get a basic idea of
 518 the changes that will be made during this step:
 519 ```
 520 diff -urp ../process/step2_eval.txt ../process/step3_env.txt
 521 ```
 522
 523 * Copy `step2_eval.qx` to `step3_env.qx`.
 524
 525 * Create `env.qx` to hold the environment definition.
 526
 527 * Define an `Env` object that is instantiated with a single `outer`
 528   parameter and starts with an empty associative data structure
 529   property `data`.
 530
 531 * Define three methods for the Env object:
 532   * set: takes a symbol key and a mal value and adds to the `data`
 533     structure
 534   * find: takes a symbol key and if the current environment contains
 535     that key then return the environment. If no key is found and outer
 536     is not `nil` then call find (recurse) on the outer environment.
 537   * get: takes a symbol key and uses the `find` method to locate the
 538     environment with the key, then returns the matching value. If no
 539     key is found up the outer chain, then throws/raises a "not found"
 540     error.
 541
 542 * Update `step3_env.qx` to use the new `Env` type to create the
 543   repl_env (with a `nil` outer value) and use the `set` method to add
 544   the numeric functions.
 545
 546 * Modify `eval_ast` to call the `get` method on the `env` parameter.
 547
 548 * Modify the apply section of `EVAL` to switch on the first element of
 549   the list:
 550   * symbol "def!": call the set method of the current environment
 551     (second parameter of `EVAL` called `env`) using the unevaluated
 552     first parameter (second list element) as the symbol key and the
 553     evaluated second parameter as the value.
 554   * symbol "let\*": create a new environment using the current
 555     environment as the outer value and then use the first parameter as
 556     a list of new bindings in the "let\*" environment. Take the second
 557     element of the binding list, call `EVAL` using the new "let\*"
 558     environment as the evaluation environment, then call `set` on the
 559     "let\*" environment using the first binding list element as the key
 560     and the evaluated second element as the value. This is repeated
 561     for each odd/even pair in the binding list. Note in particular,
 562     the bindings earlier in the list can be referred to by later
 563     bindings. Finally, the second parameter (third element) of the
 564     original `let*` form is evaluated using the new "let\*" environment
 565     and the result is returned as the result of the `let*` (the new
 566     let environment is discarded upon completion).
 567   * otherwise: call `eval_ast` on the list and apply the first element
 568     to the rest as before.
 569
 570 `def!` and `let*` are Lisp "specials" (or "special atoms") which means
 571 that they are language level features and more specifically that the
 572 rest of the list elements (arguments) may be evaluated differently (or
 573 not at all) unlike the default apply case where all elements of the
 574 list are evaluated before the first element is invoked. Lists which
 575 contain a "special" as the first element are known as "special forms".
 576 The are special because the follow special evaluation rules.
 577
 578 Try some simple environment tests:
 579
 580   * `(def! a 6)` -> `6`
 581   * `a` -> `6`
 582   * `(def! b (+ a 2))` -> `8`
 583   * `(+ a b)` -> `14`
 584   * `(let* (c 2) c)` -> `2`
 585
 586 Now go to the top level, run the step 3 tests and fix the errors.
 587 ```
 588 make "test^quux^step3"
 589 ```
 590
 591 You mal implementation is still basically just a numeric calculator
 592 with save/restore capability. But you have set the foundation for step
 593 4 where it will begin to feel like a real programming language.
 594
 595
 596 An aside on mutation and typing:
 597
 598 The "!" suffix on symbols is used to indicate that this symbol refers
 599 to a function that mutates something else. In this case, the `def!`
 600 symbol indicates a special form that will mutate the current
 601 environment. Many (maybe even most) of runtime problems that are
 602 encountered in software engineering are a result of mutation. By
 603 clearly marking code where mutation may occur, you can more easily
 604 track down the likely cause of runtime problems when they do occur.
 605
 606 Another cause of runtime errors is type errors, where a value of one
 607 type is unexpectedly treated by the program as a different and
 608 incompatible type. Statically typed languages try to make the
 609 programmer solve all type problems before the program is allowed to
 610 run. Most Lisp variants tend to be dynamically typed (types of values
 611 are checked when they are actually used at runtime).
 612
 613 As an aside-aside: The great debate between static and dynamic typing
 614 can be understood by following the money. Advocates of strict static
 615 typing use words like "correctness" and "safety" and thus get
 616 government and academic funding. Advocates of dynamic typing use words
 617 like "agile" and "time-to-market" and thus get venture capital and
 618 commercial funding.
 619
 620
 621 <a name="step4"></a>
 622
 623 ### Step 4: If Fn Do
 624
 625 ![step4_if_fn_do architecture](step4_if_fn_do.png)
 626
 627 In step 3 you added environments and the special forms for
 628 manipulating environments. In this step you will add 3 new special
 629 forms (`if`, `fn*` and `do`) and add several more core functions to
 630 the default REPL environment. Our new architecture will look like
 631 this:
 632
 633 The `fn*` special form is how new user-defined functions are created.
 634 In some Lisps, this special form is named "lambda".
 635
 636 Compare the pseudocode for step 3 and step 4 to get a basic idea of
 637 the changes that will be made during this step:
 638 ```
 639 diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt
 640 ```
 641
 642 * Copy `step3_env.qx` to `step4_if_fn_do.qx`.
 643
 644 * If you have not implemented reader and printer support (and data
 645   types) for `nil`, `true` and `false`, you will need to do so for
 646   this step.
 647
 648 * Update the constructor/initializer for environments to take two new
 649   arguments: `binds` and `exprs`. Bind (`set`) each element (symbol)
 650   of the binds list to the respective element of the `exprs` list.
 651
 652 * Add support to `printer.qx` to print functions values. A string
 653   literal like "#<function>" is sufficient.
 654
 655 * Add the following special forms to `EVAL`:
 656
 657   * `do`: Evaluate all the elements of the list using `eval_ast`
 658     and return the final evaluated element.
 659   * `if`: Evaluate the first parameter (second element). If the result
 660     (condition) is anything other than `nil` or `false`, then evaluate
 661     the second parameter (third element of the list) and return the
 662     result.  Otherwise, evaluate the third parameter (fourth element)
 663     and return the result. If condition is false and there is no third
 664     parameter, then just return `nil`.
 665   * `fn*`: Return a new function closure. The body of that closure
 666     does the following:
 667     * Create a new environment using `env` (closed over from outer
 668       scope) as the `outer` parameter, the first parameter (second
 669       list element of `ast` from the outer scope) as the `binds`
 670       parameter, and the parameters to the closure as the `exprs`
 671       parameter.
 672     * Call `EVAL` on the second parameter (third list element of `ast`
 673       from outer scope), using the new environment. Use the result as
 674       the return value of the closure.
 675
 676 If your target language does not support closures, then you will need
 677 to implement `fn*` using some sort of structure or object that stores
 678 the values being closed over: the first and second elements of the
 679 `ast` list (function parameter list and function body) and the current
 680 environment `env`. In this case, your native functions will need to be
 681 wrapped in the same way. You will probably also need a method/function
 682 that invokes your function object/structure for the default case of
 683 the apply section of `EVAL`.
 684
 685 Try out the basic functionality you have implemented:
 686
 687   * `(fn* [a] a)` -> `#<function>`
 688   * `( (fn* [a] a) 7)` -> `7`
 689   * `( (fn* [a] (+ a 1)) 10)` -> `11`
 690   * `( (fn* [a b] (+ a b)) 2 3)` -> `5`
 691
 692 * Add a new file `core.qx` and define an associative data structure
 693   `ns` (namespace) that maps symbols to functions. Move the numeric
 694   function definitions into this structure.
 695
 696 * Modify `step4_if_fn_do.qx` to iterate through the `core.ns`
 697   structure and add (`set`) each symbol/function mapping to the
 698   REPL environment (`repl_env`).
 699
 700 * Add the following functions to `core.ns`:
 701   * `list`: take the parameters and return them as a list.
 702   * `list?`: return true if the first parameter is a list, false
 703     otherwise.
 704   * `empty?`: treat the first parameter as a list and return true if
 705     the list is empty and false if it contains any elements.
 706   * `count`: treat the first parameter as a list and return the number
 707     of elements that it contains.
 708   * `=`: compare the first two parameters and return true if they are
 709     the same type and contain the same value. In the case of equal
 710     length lists, each element of the list should be compared for
 711     equality and if they are the same return true, otherwise false.
 712   * `<`, `<=`, `>`, and `>=`: treat the first two parameters as
 713     numbers and do the corresponding numeric comparison, returning
 714     either true or false.
 715
 716 Now go to the top level, run the step 4 tests. There are a lot of
 717 tests in step 4 but all of the non-optional tests that do not involve
 718 strings should be able to pass now.
 719
 720 ```
 721 make "test^quux^step4"
 722 ```
 723
 724 Your mal implementation is already beginning to look like a real
 725 language. You have flow control, conditionals, user-defined functions
 726 with lexical scope, side-effects (if you implement the string
 727 functions), etc. However, our little interpreter has not quite reached
 728 Lisp-ness yet. The next several steps will take your implementation
 729 from a neat toy to a full featured language.
 730
 731 #### Deferrable:
 732
 733 * Implement Clojure-style variadic function parameters. Modify the
 734   constructor/initializer for environments, so that if a "&" symbol is
 735   encountered in the `binds` list, the next symbol in the `binds` list
 736   after the "&" is bound to the rest of the `exprs` list that has not
 737   been bound yet.
 738
 739 * Defines a `not` function using mal itself. In `step4_if_fn_do.qx`
 740   call the `rep` function with this string:
 741   "(def! not (fn* (a) (if a false true)))".
 742
 743 * Implement the strings functions in `core.qx`. To implement these
 744   functions, you will need to implement the string support in the
 745   reader and printer (deferrable section of step 1). Each of the string
 746   functions takes multiple mal values, prints them (`pr_str`) and
 747   joins them together into a new string.
 748   * `pr-str`: calls `pr_str` on each argument with `print_readably`
 749     set to true, joins the results with " " and returns the new
 750     string.
 751   * `str`: calls `pr_str` on each argument with `print_readably` set
 752     to false, concatenates the results together ("" separator), and
 753     returns the new string.
 754   * `prn`:  calls `pr_str` on each argument with `print_readably` set
 755     to true, joins the results with " ", prints the string to the
 756     screen and then returns `nil`.
 757   * `println`:  calls `pr_str` on each argument with `print_readably` set
 758     to false, joins the results with " ", prints the string to the
 759     screen and then returns `nil`.
 760
 761
 762 <a name="step5"></a>
 763
 764 ### Step 5: Tail call optimization
 765
 766 ![step5_tco architecture](step5_tco.png)
 767
 768 In step 4 you added special forms `do`, `if` and `fn*` and you defined
 769 some core functions. In this step you will add a Lisp feature called
 770 tail call optimization (TCO). Also called "tail recursion" or
 771 sometimes just "tail calls".
 772
 773 Several of the special forms that you have defined in `EVAL` end up
 774 calling back into `EVAL`. For those forms that call `EVAL` as the last
 775 thing that they do before returning (tail call) you will just loop back
 776 to the beginning of eval rather than calling it again. The advantage
 777 of this approach is that it avoids adding more frames to the call
 778 stack. This is especially important in Lisp languages because they do
 779 not tend to have iteration control structures preferring recursion
 780 instead. However, with tail call optimization, recursion can be made
 781 as stack efficient as iteration.
 782
 783 Compare the pseudocode for step 4 and step 5 to get a basic idea of
 784 the changes that will be made during this step:
 785 ```
 786 diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt
 787 ```
 788
 789 * Copy `step4_if_fn_do.qx` to `step5_tco.qx`.
 790
 791 * Add a loop (e.g. while true) around all code in `EVAL`.
 792
 793 * Modify each of the following form cases to add tail call recursion
 794   support:
 795   * `let*`: remove the final `EVAL` call on the second `ast` argument
 796     (third list element). Set `env` (i.e. the local variable passed in
 797     as second parameter of `EVAL`) to the new let environment. Set
 798     `ast` (i.e. the local variable passed in as first parameter of
 799     `EVAL`) to be the second `ast` argument. Continue at the beginning
 800     of the loop (no return).
 801   * `do`: change the `eval_ast` call to evaluate all the parameters
 802     except for the last (2nd list element up to but not including
 803     last). Set `ast` to the last element of `ast`. Continue
 804     at the beginning of the loop (`env` stays unchanged).
 805   * `if`: the condition continues to be evaluated, however, rather
 806     than evaluating the true or false branch, `ast` is set to the
 807     unevaluated value of the chosen branch. Continue at the beginning
 808     of the loop (`env` is unchanged).
 809
 810 * The return value from the `fn*` special form will now become an
 811   object/structure with attributes that allow the default invoke case
 812   of `EVAL` to do TCO on mal functions. Those attributes are:
 813   * `ast`: the second `ast` argument (third list element) representing
 814     the body of the function.
 815   * `params`: the first `ast` argument (second list element)
 816     representing the parameter names of the function.
 817   * `env`: the current value of the `env` parameter of `EVAL`.
 818   * `fn`: the original function value (i.e. what was return by `fn*`
 819     in step 4). Note that this is deferrable until step 9 when it is
 820     needed for the `map` and `apply` core functions).
 821
 822 * The default "apply"/invoke case of `EVAL` must now be changed to
 823   account for the new object/structure returned by the `fn*` form.
 824   Continue to call `eval_ast` on `ast`. The first element is `f`.
 825   Switch on the type of `f`:
 826   * regular function (not one defined by `fn*`): apply/invoke it as
 827     before (in step 4).
 828   * a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate
 829     a new environment using the `env` and `params` attributes of `f`
 830     as the `outer` and `binds` arguments and rest `ast` arguments
 831     (list elements 2 through the end) as the `exprs` argument. Set
 832     `env` to the new environment. Continue at the beginning of the loop.
 833
 834 Run some manual tests from previous steps to make sure you have not
 835 broken anything by adding TCO.
 836
 837 Now go to the top level, run the step 5 tests.
 838
 839 ```
 840 make "test^quux^step5"
 841 ```
 842
 843 Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to`
 844 function cannot be tail call optimized because it does something after
 845 the recursive call (`sum-to` calls itself and then does the addition).
 846 Lispers say that the `sum-to` is not in tail position. The `sum2`
 847 function however, calls itself from tail position. In other words, the
 848 recursive call to `sum2` is the last action that `sum2` does. Calling
 849 `sum-to` with a large value will cause a stack overflow exception in
 850 most target languages (some have super-special tricks they use to
 851 avoid stack overflows).
 852
 853 Congratulations, your mal implementation already has a feature (TCO)
 854 that most mainstream languages lack.
 855
 856
 857 <a name="step6"></a>
 858
 859 ### Step 6: Files and Evil
 860
 861 ![step6_file architecture](step6_file.png)
 862
 863 In step 5 you added tail call optimization. In this step you will add
 864 some string and file operations and give your implementation a touch
 865 of evil ... er, eval. And as long as your language supports function
 866 closures, this step will be quite simple. However, to complete this
 867 step, you must implement string type support, so if you have been
 868 holding off on that you will need to go back and do so.
 869
 870 Compare the pseudocode for step 5 and step 6 to get a basic idea of
 871 the changes that will be made during this step:
 872 ```
 873 diff -urp ../process/step5_tco.txt ../process/step6_file.txt
 874 ```
 875
 876 * Copy `step5_tco.qx` to `step6_file.qx`.
 877
 878 * Add two new string functions to the core namespaces:
 879   * `read-string`: this function just exposes the `read_str` function
 880     from the reader. If your mal string type is not the same as your
 881     target language (e.g. statically typed language) then your
 882     `read-string` function will need to unbox (extract) the raw string
 883     from the mal string type in order to call `read_str`.
 884   * `slurp`: this function takes a file name (string) and returns the
 885     contents of the file as a string. Once again, if your mal string
 886     type wraps a raw target language string, then you will need to
 887     unmarshall (extract) the string parameter to get the raw file name
 888     string and marshall (wrap) the result back to a mal string type.
 889
 890 * In your main program, add a new symbol "eval" to your REPL
 891   environment. The value of this new entry is a function that takes
 892   a single argument `ast`. The closure calls the your `EVAL` function
 893   using the `ast` as the first argument and the REPL environment
 894   (closed over from outside) as the second argument. The result of
 895   the `EVAL` call is returned. This simple but powerful addition
 896   allows your program to treat mal data as a mal program. For example,
 897   you can now to this:
 898 ```
 899 (def! mal-prog (list + 1 2))
 900 (eval mal-prog)
 901 ```
 902
 903 * Define a `load-file` function using mal itself. In your main
 904   program call the `rep` function with this string:
 905   "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \")\")))))".
 906
 907 Try out `load-file`:
 908   * `(load-file "../tests/incA.mal")` -> `9`
 909   * `(inc4 3)` -> `7`
 910
 911 The `load-file` function does the following:
 912   * Call `slurp` to read in a file by name. Surround the contents with
 913     "(do ...)" so that the whole file will be treated as a single
 914     program AST (abstract syntax tree).
 915   * Call `read-string` on the string returned from `slurp`. This uses
 916     the reader to read/convert the file contents into mal data/AST.
 917   * Call `eval` (the one in the REPL environment) on the AST returned
 918     from `read-string` to "run" it.
 919
 920 Besides adding file and eval support, we'll add support for the atom data type
 921 in this step.  An atom is the Mal way to represent *state*; it is
 922 heavily inspired by [Clojure's atoms](http://clojure.org/state).  An atom holds
 923 a reference to a single Mal value of any type; it supports reading that Mal value
 924 and *modifying* the reference to point to another Mal value.  Note that this is
 925 the only Mal data type that is mutable (but the Mal values it refers to are
 926 still immutable; immutability is explained in greater detail in step 7).
 927 You'll need to add 5 functions to the core namesapce to support atoms:
 928
 929   * `atom`: Takes a Mal value and returns a new atom which points to that Mal value.
 930   * `atom?`: Takes an argument and returns `true` if the argument is an atom.
 931   * `deref`: Takes an atom argument and returns the Mal value referenced by this atom.
 932   * `reset!`: Takes an atom and a Mal value; the atom is modified to refer to
 933     the given Mal value. The Mal value is returned.
 934   * `swap!`: Takes an atom, a function, and zero or more function arguments. The
 935     atom's value is modified to result of applying the function with the atom's
 936     value as the first argument and the optionally given function arguments as
 937     the rest of the arguments. The new atom's value is returned. (Side note: Mal is
 938     single-threaded, but in concurrent languages like Clojure, `swap!` promises
 939     atomic update: `(swap! myatom (fn* [x] (+ 1 x)))` will always increase the
 940     `myatom` counter by one and will not suffer from missing updates when the
 941     atom is updated from multiple threads.)
 942
 943 Optionally, you can add a reader macro `@` which will serve as a short form for
 944 `deref`, so that `@a` is equivalent to `(deref a)`.  In order to do that, modify
 945 the conditional in reader `read_form` function and add a case which deals with
 946 the `@` token: if the token is `@` (at sign) then return a new list that
 947 contains the symbol `deref` and the result of reading the next form
 948 (`read_form`).
 949
 950 Now go to the top level, run the step 6 tests. The optional tests will
 951 need support from the reader for comments, vectors, hash-maps and the `@`
 952 reader macro:
 953 ```
 954 make "test^quux^step6"
 955 ```
 956
 957 Congratulations, you now have a full-fledged scripting language that
 958 can run other mal programs. The `slurp` function loads a file as
 959 a string, the `read-string` function calls the mal reader to turn that
 960 string into data, and the `eval` function takes data and evaluates it
 961 as a normal mal program. However, it is important to note that the
 962 `eval` function is not just for running external programs. Because mal
 963 programs are regular mal data structures, you can dynamically generate
 964 or manipulate those data structures before calling `eval` on them.
 965 This isomorphisism (same shape) between data and programs is known as
 966 "homoiconicity". Lisp languages are homoiconic and this property
 967 distinguishes them from most other programming languages.
 968
 969 You mal implementation is quite powerful already but the set of
 970 functions that are available (from `core.qx`) is fairly limited. The
 971 bulk of the functions you will add are described in step 9 and step A,
 972 but you will begin to flesh them out over the next few steps to
 973 support quoting (step 7) and macros (step 8).
 974
 975
 976 #### Deferrable:
 977
 978 * Add the ability to run another mal program from the command line.
 979   Prior to the REPL loop, check if your mal implementation is called
 980   with command line arguments. If so, treat the first argument as
 981   a filename and use `rep` to call `load-file` on that filename, and
 982   finally exit/terminate execution.
 983
 984 * Add the rest of the command line arguments to your REPL environment
 985   so that programs that are run with `load-file` have access to their
 986   calling environmnet. Add a new "\*ARGV\*" (symbol) entry to your REPL
 987   environment. The value of this entry should be the rest of the
 988   command line arguments as a mal list value.
 989
 990
 991 <a name="step7"></a>
 992
 993 ### Step 7: Quoting
 994
 995 ![step7_quote architecture](step7_quote.png)
 996
 997 In step 7 you will add the special forms `quote` and `quasiquote` and
 998 add supporting core functions `cons` and `concat`. The two quote forms
 999 add a powerful abstraction for manipulating mal code itself
1000 (meta-programming).
1001
1002 The `quote` special form indicates to the evaluator (`EVAL`) that the
1003 parameter should not be evaluated (yet). At first glance, this might
1004 not seem particularly useful but an example of what this enables is the
1005 ability for a mal program to refer to a symbol itself rather than the
1006 value that it evaluates to. Likewise with lists. For example, consider
1007 the following:
1008
1009 * `(prn abc)`: this will lookup the symbol `abc` in the current
1010   evaluation environment and print it. This will result in error if
1011   `abc` is not defined.
1012 * `(prn (quote abc))`: this will print "abc" (prints the symbol
1013   itself). This will work regardless of whether `abc` is defined in
1014   the current environment.
1015 * `(prn (1 2 3))`: this will result in an error because `1` is not
1016   a function and cannot be applied to the arguments `(2 3)`.
1017 * `(prn (quote (1 2 3)))`: this will print "(1 2 3)".
1018 * `(def! l (quote (1 2 3)))`: list quoting allows us to define lists
1019   directly in the code (list literal). Another way of doing this is
1020   with the list function: `(def! l (list 1 2 3))`.
1021
1022 The second special quoting form is `quasiquote`. This allows a quoted
1023 list to have internal elements of the list that are temporarily
1024 unquoted (normal evaluation). There are two special forms that only
1025 mean something within a quasiquoted list: `unquote` and
1026 `splice-unquote`. These are perhaps best explained with some examples:
1027
1028 * `(def! lst (quote (2 3)))` -> `(2 3)`
1029 * `(quasiquote (1 (unquote lst)))` -> `(1 (2 3))`
1030 * `(quasiquote (1 (splice-unquote lst)))` -> `(1 2 3)`
1031
1032 The `unquote` form turns evaluation back on for its argument and the
1033 result of evaluation is put in place into the quasiquoted list. The
1034 `splice-unquote` also turns evaluation back on for its argument, but
1035 the evaluated value must be a list which is then "spliced" into the
1036 quasiquoted list. The true power of the quasiquote form will be
1037 manifest when it used together with macros (in the next step).
1038
1039 Compare the pseudocode for step 6 and step 7 to get a basic idea of
1040 the changes that will be made during this step:
1041 ```
1042 diff -urp ../process/step6_file.txt ../process/step7_quote.txt
1043 ```
1044
1045 * Copy `step6_file.qx` to `step7_quote.qx`.
1046
1047 * Before implementing the quoting forms, you will need to implement
1048 * some supporting functions in the core namespace:
1049   * `cons`: this function takes a list as its second
1050     parameter and returns a new list that has the first argument
1051     prepended to it.
1052   * `concat`: this functions takes 0 or more lists as
1053     parameters and returns a new list that is a concatenation of all
1054     the list parameters.
1055
1056 An aside on immutability: note that neither cons or concat mutate
1057 their original list arguments. Any references to them (i.e. other
1058 lists that they may be "contained" in) will still refer to the
1059 original unchanged value. Mal, like Clojure, is a language which uses
1060 immutable data structures. I encourage you to read about the power and
1061 importance of immutability as implemented in Clojure (from which
1062 Mal borrows most of its syntax and feature-set).
1063
1064 * Add the `quote` special form. This form just returns its argument
1065   (the second list element of `ast`).
1066
1067 * Add the `quasiquote` special form. First implement a helper function
1068   `is_pair` that returns true if the parameter is a non-empty list.
1069   Then define a `quasiquote` function. This is called from `EVAL` with
1070   the first `ast` argument (second list element) and then `ast` is set
1071   to the result and execution continues at the top of the loop (TCO).
1072   The `quasiquote` function takes a parameter `ast` and has the
1073   following conditional:
1074   1. if `is_pair` of `ast` is false: return a new list containing:
1075      a symbol named "quote" and `ast`.
1076   2. else if the first element of `ast` is a symbol named "unquote":
1077      return the second element of `ast`.
1078   3. if `is_pair` of first element of `ast` is true and the first
1079      element of first element of `ast` (`ast[0][0]`) is a symbol named
1080      "splice-unquote": return a new list containing: a symbol named
1081      "concat", the second element of first element of `ast`
1082      (`ast[0][1]`), and the result of calling `quasiquote` with the
1083      second through last element of `ast`.
1084   4. otherwise: return a new list containing: a symbol named "cons", the
1085      result of calling `quasiquote` on first element of `ast`
1086      (`ast[0]`), and result of calling `quasiquote` with the second
1087      through last element of `ast`.
1088
1089
1090 Now go to the top level, run the step 7 tests:
1091 ```
1092 make "test^quux^step7"
1093 ```
1094
1095 Quoting is one of the more mundane functions available in mal, but do
1096 not let that discourage you. Your mal implementation is almost
1097 complete, and quoting sets the stage for the next very exiting step:
1098 macros.
1099
1100
1101 #### Deferrable
1102
1103 * The full names for the quoting forms are fairly verbose. Most Lisp
1104   languages have a short-hand syntax and Mal is no exception. These
1105   short-hand syntaxes are known as reader macros because they allow us
1106   to manipulate mal code during the reader phase. Macros that run
1107   during the eval phase are just called "macros" and are described in
1108   the next section. Expand the conditional with reader `read_form`
1109   function to add the following four cases:
1110   * token is "'" (single quote): return a new list that contains the
1111     symbol "quote" and the result of reading the next form
1112     (`read_form`).
1113   * token is "\`" (back-tick): return a new list that contains the
1114     symbol "quasiquote" and the result of reading the next form
1115     (`read_form`).
1116   * token is "~" (tilde): return a new list that contains the
1117     symbol "unquote" and the result of reading the next form
1118     (`read_form`).
1119   * token is "~@" (tilde + at sign): return a new list that contains
1120     the symbol "splice-unquote" and the result of reading the next
1121     form (`read_form`).
1122
1123 * Add support for quoting of vectors. The `is_pair` function should
1124   return true if the argument is a non-empty list or vector. `cons`
1125   should also accept a vector as the second argument. The return value
1126   is a list regardless. `concat` should support concatenation of
1127   lists, vectors, or a mix or both. The result is always a list.
1128
1129
1130 <a name="step8"></a>
1131
1132 ### Step 8: Macros
1133
1134 ![step8_macros architecture](step8_macros.png)
1135
1136 Your mal implementation is now ready for one of the most Lispy and
1137 exciting of all programming concepts: macros. In the previous step,
1138 quoting enabled some simple manipulation data structures and therefore
1139 manipulation of mal code (because the `eval` function from step
1140 6 turns mal data into code). In this step you will be able to mark mal
1141 functions as macros which can manipulate mal code before it is
1142 evaluated. In other words, macros are user-defined special forms. Or
1143 to look at it another way, macros allow mal programs to redefine
1144 the mal language itself.
1145
1146 Compare the pseudocode for step 7 and step 8 to get a basic idea of
1147 the changes that will be made during this step:
1148 ```
1149 diff -urp ../process/step7_quote.txt ../process/step8_macros.txt
1150 ```
1151
1152 * Copy `step7_quote.qx` to `step8_macros.qx`.
1153
1154
1155 You might think that the infinite power of macros would require some
1156 sort of complex mechanism, but the implementation is actually fairly
1157 simple.
1158
1159 * Add a new attribute `is_macro` to mal function types. This should
1160   default to false.
1161
1162 * Add a new special form `defmacro!`. This is very similar to the
1163   `def!` form, but before the evaluated value (mal function) is set in
1164   the environment, the `is_macro` attribute should be set to true.
1165
1166 * Add a `is_macro_call` function: This function takes arguments `ast`
1167   and `env`. It returns true if `ast` is a list that contains a symbol
1168   as the first element and that symbol refers to a function in the
1169   `env` environment and that function has the `is_macro` attribute set
1170   to true. Otherwise, it returns false.
1171
1172 * Add a `macroexpand` function: This function takes arguments `ast`
1173   and `env`. It calls `is_macro_call` with `ast` and `env` and loops
1174   while that condition is true. Inside the loop, the first element of
1175   the `ast` list (a symbol), is looked up in the environment to get
1176   the macro function. This macro function is then called/applied with
1177   the rest of the `ast` elements (2nd through the last) as arguments.
1178   The return value of the macro call becomes the new value of `ast`.
1179   When the loop completes because `ast` no longer represents a macro
1180   call, the current value of `ast` is returned.
1181
1182 * In the evaluator (`EVAL`) before the special forms switch (apply
1183   section), perform macro expansion by calling the `macroexpand`
1184   function with the current value of `ast` and `env`. Set `ast` to the
1185   result of that call. If the new value of `ast` is no longer a list
1186   after macro expansion, then return `ast`, otherwise continue with
1187   the rest of the apply section (special forms switch).
1188
1189 * Add a new special form condition for `macroexpand`. Call the
1190   `macroexpand` function using the first `ast` argument (second list
1191   element) and `env`. Return the result. This special form allows
1192   a mal program to do explicit macro expansion without applying the
1193   result (which can be useful for debugging macro expansion).
1194
1195 Now go to the top level, run the step 8 tests:
1196 ```
1197 make "test^quux^step8"
1198 ```
1199
1200 There is a reasonably good chance that the macro tests will not pass
1201 the first time. Although the implementation of macros is fairly
1202 simple, debugging runtime bugs with macros can be fairly tricky. If
1203 you do run into subtle problems that are difficult to solve, let me
1204 recommend a couple of approaches:
1205
1206 * Use the macroexpand special form to eliminate one of the layers of
1207   indirection (to expand but skip evaluate). This will often reveal
1208   the source of the issue.
1209 * Add a debug print statement to the top of your main `eval` function
1210   (inside the TCO loop) to print the current value of `ast` (hint use
1211   `pr_str` to get easier to debug output). Pull up the step8
1212   implementation from another language and uncomment its `eval`
1213   function (yes, I give you permission to violate the rule this once).
1214   Run the two side-by-side. The first difference is likely to point to
1215   the bug.
1216
1217 Congratulations! You now have a Lisp interpreter with a super power
1218 that most non-Lisp languages can only dream of (I have it on good
1219 authority that languages dream when you are not using them). If you
1220 are not already familiar with Lisp macros, I suggest the following
1221 excercise: write a recursive macro that handles postfixed mal code
1222 (with the function as the last parameter instead of the first). Or
1223 not. I have not actually done so myself, but I have heard it is an
1224 interesting excercise.
1225
1226 In the next step you will add try/catch style exception handling to
1227 your implementation in addition to some new core functions. After
1228 step9 you will be very close to having a fully self-hosting mal
1229 implementation. Let us continue!
1230
1231
1232 ### Deferrable
1233
1234 * Add the following new core functions which are frequently used in
1235   macro functions:
1236   * `nth`: this function takes a list (or vector) and a number (index)
1237     as arguments, returns the element of the list at the given index.
1238     If the index is out of range, this function raises an exception.
1239   * `first`: this function takes a list (or vector) as its argument
1240     and return the first element. If the list (or vector) is empty or
1241     is `nil` then `nil` is returned.
1242   * `rest`: this function takes a list (or vector) as its argument and
1243     returns a new list containing all the elements except the first.
1244
1245 * In the main program, use the `rep` function to define two new
1246   control structures macros. Here are the string arguments for `rep`
1247   to define these macros:
1248   * `cond`: "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))"
1249   * `or`: "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) `(let* (or_FIXME ~(first xs)) (if or_FIXME or_FIXME (or ~@(rest xs))))))))"
1250
1251
1252 <a name="step9"></a>
1253
1254 ### Step 9: Try
1255
1256 ![step9_try architecture](step9_try.png)
1257
1258 In this step you will implement the final mal special form for
1259 error/exception handling: `try*/catch*`. You will also add several core
1260 functions to your implementation. In particular, you will enhance the
1261 functional programming pedigree of your implementation by adding the
1262 `apply` and `map` core functions.
1263
1264 Compare the pseudocode for step 8 and step 9 to get a basic idea of
1265 the changes that will be made during this step:
1266 ```
1267 diff -urp ../process/step8_macros.txt ../process/step9_try.txt
1268 ```
1269
1270 * Copy `step8_macros.qx` to `step9_try.qx`.
1271
1272 * Add the `try*/catch*` special form to the EVAL function. The
1273   try catch form looks like this: `(try* A (catch* B C))`. The form
1274   `A` is evaluated, if it throws an exception, then form `C` is
1275   evaluated with a new environment that binds the symbol B to the
1276   value of the exception that was thrown.
1277   * If your target language has built-in try/catch style exception
1278     handling then you are already 90% of the way done. Add a
1279     (native language) try/catch block that calls evaluates `A` within
1280     the try block and catches all exceptions. If an exception is
1281     caught, then translate it to a mal type/value. For native
1282     exceptions this is either the message string or a mal hash-map
1283     that contains the message string and other attributes of the
1284     exception. When a regular mal types/values is used as an
1285     exception, you will probably need to store it within a native
1286     exception type in order to be able to convey/transport it using
1287     the native try/catch mechanism. Then you will extract the mal
1288     type/value from the native exception. Create a new mal environment
1289     that binds B to the value of the exception. Finally, evaluate `C`
1290     using that new environment.
1291   * If your target language does not have built-in try/catch style
1292     exception handling then you have some extra work to do. One of the
1293     most straightforward approaches is to create a a global error
1294     variable that stores the thrown mal type/value. The complication
1295     is that there are a bunch of places where you must check to see if
1296     the global error state is set and return without proceeding. The
1297     rule of thumb is that this check should happen at the top of your
1298     EVAL function and also right after any call to EVAL (and after any
1299     function call that might happen to call EVAL further down the
1300     chain). Yes, it is ugly, but you were warned in the section on
1301     picking a language.
1302
1303 * Add the `throw` core function.
1304   * If your language supports try/catch style exception handling, then
1305     this function takes a mal type/value and throws/raises it as an
1306     exception. In order to do this, you may need to create a custom
1307     exception object that wraps a mal value/type.
1308   * If your language does not support try/catch style exception
1309     handling, then set the global error state to the mal type/value.
1310
1311 * Add the `apply` and `map` core functions. In step 5, if you did not
1312   add the original function (`fn`) to the structure returned from
1313   `fn*`, the you will need to do so now.
1314   * `apply`: takes at least two arguments. The first argument is
1315     a function and the last argument is list (or vector). The
1316     arguments between the function and the last arguemnt (if there are
1317     any) are concatenated with the final argument to create the
1318     arguments that are used to call the function. The apply
1319     function allows a function to be called with arguments that are
1320     contained in a list (or vector). In other words, `(apply F A B [C
1321     D])` is equivalent to `(F A B C D)`.
1322   * `map`: takes a function and a list (or vector) and evaluates the
1323     function against every element of the list (or vector) one at
1324     a time and returns the results as a list.
1325
1326 * Add some type predicates core functions. In Lisp, predicates are
1327   functions that return true/false (or true value/nil) and typically
1328   end in "?" or "p".
1329   * `nil?`: takes a single argument and returns true (mal true value)
1330     if the argument is nil (mal nil value).
1331   * `true?`: takes a single argument and returns true (mal true value)
1332     if the argument is a true value (mal true value).
1333   * `false?`: takes a single argument and returns true (mal true
1334     value) if the argument is a false value (mal false value).
1335   * `symbol?`: takes a single argument and returns true (mal true
1336     value) if the argument is a symbol (mal symbol value).
1337
1338 Now go to the top level, run the step 9 tests:
1339 ```
1340 make "test^quux^step9"
1341 ```
1342
1343 Your mal implementation is now essentially a fully featured Lisp
1344 interpreter. But if you stop now you will miss one of the most
1345 satisfying and enlightening aspects of creating a mal implementation:
1346 self-hosting.
1347
1348 ### Deferrable
1349
1350 * Add the following new core functions:
1351   * `symbol`: takes a string and returns a new symbol with the string
1352     as its name.
1353   * `keyword`: takes a string and returns a keyword with the same name
1354     (usually just be prepending the special keyword
1355     unicode symbol). This function should also detect if the argument
1356     is already a keyword and just return it.
1357   * `keyword?`: takes a single argument and returns true (mal true
1358     value) if the argument is a keyword, otherwise returns false (mal
1359     false value).
1360   * `vector`: takes a variable number of arguments and returns
1361     a vector containing those arguments.
1362   * `vector?`: takes a single argument and returns true (mal true
1363     value) if the argument is a vector, otherwise returns false (mal
1364     false value).
1365   * `hash-map`: takes a variable but even number of arguments and
1366     returns a new mal hash-map value with keys from the odd arguments
1367     and values from the even arguments respectively. This is basically
1368     the functional form of the `{}` reader literal syntax.
1369   * `map?`: takes a single argument and returns true (mal true
1370     value) if the argument is a hash-map, otherwise returns false (mal
1371     false value).
1372   * `assoc`: takes a hash-map as the first argument and the remaining
1373     arguments are odd/even key/value pairs to "associate" (merge) into
1374     the hash-map. Note that the original hash-map is unchanged
1375     (remember, mal values are immutable), and a new hash-map
1376     containing the old hash-maps key/values plus the merged key/value
1377     arguments is returned.
1378   * `dissoc`: takes a hash-map and a list of keys to remove from the
1379     hash-map. Again, note that the original hash-map is unchanged and
1380     a new hash-map with the keys removed is returned. Key arguments
1381     that do not exist in the hash-map are ignored.
1382   * `get`: takes a hash-map and a key and returns the value of looking
1383     up that key in the hash-map. If the key is not found in the
1384     hash-map then nil is returned.
1385   * `contains?`: takes a hash-map and a key and returns true (mal true
1386     value) if the key exists in the hash-map and false (mal false
1387     value) otherwise.
1388   * `keys`: takes a hash-map and returns a list (mal list value) of
1389     all the keys in the hash-map.
1390   * `vals`: takes a hash-map and returns a list (mal list value) of
1391     all the values in the hash-map.
1392   * `sequential?`: takes a single arguments and returns true (mal true
1393     value) if it is a list or a vector, otherwise returns false (mal
1394     false value).
1395
1396
1397 <a name="stepA"></a>
1398
1399 ### Step A: Mutation, Self-hosting and Interop
1400
1401 ![stepA_mal architecture](stepA_mal.png)
1402
1403 You have reached the final step of your mal implementation. This step
1404 is kind of a catchall for things that did not fit into other steps.
1405 But most importantly, the changes you make in this step will unlock
1406 the magical power known as "self-hosting". You might have noticed
1407 that one of the languages that mal is implemented in is "mal". Any mal
1408 implementation that is complete enough can run the mal implementation
1409 of mal. You might need to pull out your hammock and ponder this for
1410 a while if you have never built a compiler or interpreter before. Look
1411 at the step source files for the mal implementation of mal (it is not
1412 cheating now that you have reached step A).
1413
1414 If you deferred the implementation of keywords, vectors and hash-maps,
1415 now is the time to go back and implement them if you want your
1416 implementation to self-host.
1417
1418 Compare the pseudocode for step 9 and step A to get a basic idea of
1419 the changes that will be made during this step:
1420 ```
1421 diff -urp ../process/step9_try.txt ../process/stepA_mal.txt
1422 ```
1423
1424 * Copy `step9_try.qx` to `stepA_mal.qx`.
1425
1426 * Add meta-data support to mal functions. TODO. Should be separate
1427   from the function macro flag.
1428
1429 * Add the `readline` core function. TODO
1430
1431
1432 Now go to the top level, run the step A tests:
1433 ```
1434 make "test^quux^stepA"
1435 ```
1436
1437 Once you have passed all the non-optional step A tests, it is time to
1438 try self-hosting. Run your step A implementation as normal, but use
1439 the file argument mode you added in step 6 to run a each of the step
1440 from the mal implementation:
1441 ```
1442 ./stepA_mal.qx ../mal/step1_read_print.mal
1443 ./stepA_mal.qx ../mal/step2_eval.mal
1444 ...
1445 ./stepA_mal.qx ../mal/step9_try.mal
1446 ./stepA_mal.qx ../mal/stepA_mal.mal
1447 ```
1448
1449 There is a very good chance that you will encounter an error at some
1450 point while trying to run the mal in mal implementation steps above.
1451 Debugging failures that happen while self-hosting is MUCH more
1452 difficult and mind bending. One of the best approaches I have
1453 personally found is to add prn statements to the mal implementation
1454 step (not your own implementation of mal) that is causing problems.
1455
1456 Another approach I have frequently used is to pull out the code from
1457 the mal implementation that is causing the problem and simplify it
1458 step by step until you have a simple piece of mal code that still
1459 reproduces the problem. Once the reproducer is simple enough you will
1460 probably know where in your own implementation that problem is likely
1461 to be. Please add your simple reproducer as a test case so that future
1462 implementers will fix similar issues in their code before they get to
1463 self-hosting when it is much more difficult to track down and fix.
1464
1465 Once you can manually run all the self-hosted steps, it is time to run
1466 all the tests in self-hosted mode:
1467 ```
1468 make MAL_IMPL=quux "test^mal"
1469 ```
1470
1471 When you run into problems (which you almost certainly will), use the
1472 same process described above to debug them.
1473
1474 Congratulations!!! When all the tests pass, you should pause for
1475 a moment and consider what you have accomplished. You have implemented
1476 a Lisp interpreter that is powerful and complete enough to run a large
1477 mal program which is itself an implementation of the mal language. You
1478 might even be asking if you can continue the "inception" by using your
1479 implementation to run a mal implementation which itself runs the mal
1480 implementation.
1481
1482
1483 ### Optional: gensym
1484
1485 The `or` macro we introduced at step 8 has a bug. It defines a
1486 variable called `or_FIXME`, which "shadows" such a binding from the
1487 user's code (which uses the macro). If a user has a variable called
1488 `or_FIXME`, it cannot be used as an `or` macro argument. In order to
1489 fix that, we'll introduce `gensym`: a function which returns a symbol
1490 which was never used before anywhere in the program. This is also an
1491 example for the use of mal atoms to keep state (the state here being
1492 the number of symbols produced by `gensym` so far).
1493
1494 Previously you used `rep` to define the `or` macro. Remove that
1495 definition and use `rep` to define the new counter, `gensym` function
1496 and the clean `or` macro. Here are the string arguments you need to
1497 pass to `rep`:
1498 ```
1499 "(def! *gensym-counter* (atom 0))"
1500
1501 "(def! gensym (fn* [] (symbol (str \"G__\" (swap! *gensym-counter* (fn* [x] (+ 1 x)))))))"
1502
1503 "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) (let* (condvar (gensym)) `(let* (~condvar ~(first xs)) (if ~condvar ~condvar (or ~@(rest xs)))))))))"
1504 ```
1505
1506 For extra information read [Peter Seibel's thorough discussion about
1507 `gensym` and leaking macros in Common Lisp](http://www.gigamonkeys.com/book/macros-defining-your-own.html#plugging-the-leaks).
1508
1509
1510 ### Optional additions
1511
1512 * Add metadata support to composite data types, symbols and native
1513   functions. TODO
1514 * `time-ms` TODO
1515 * `conj` TODO
1516
1517
1518 ## TODO:
1519
1520 * simplify: "X argument (list element Y)" -> ast[Y]
1521 * list of types with metadata: list, vector, hash-map, mal functions
1522 * more clarity about when to peek and poke in read_list and read_form
1523 * tokenizer: use first group rather than whole match (to eliminate
1524   whitespace/commas)