process/guide.md

   1 # The Make-A-Lisp Process
   2
   3 So you want to write a Lisp interpreter? Welcome!
   4
   5 The goal of the Make-A-Lisp project is to make it easy to write your
   6 own Lisp interpreter without sacrificing those many "Aha!" moments
   7 that come from ascending the McCarthy mountain. When you reach the peak
   8 of this particular mountain, you will have an interpreter for the mal
   9 Lisp language that is powerful enough to be self-hosting, meaning it
  10 will be able to run a mal interpreter written in mal itself.
  11
  12 So jump right in (er ... start the climb)!
  13
  14 - [Pick a language](#pick-a-language)
  15 - [Getting started](#getting-started)
  16 - [General hints](#general-hints)
  17 - [The Make-A-Lisp Process](#the-make-a-lisp-process-1)
  18   - [Step 0: The REPL](#step-0-the-repl)
  19   - [Step 1: Read and Print](#step-1-read-and-print)
  20   - [Step 2: Eval](#step-2-eval)
  21   - [Step 3: Environments](#step-3-environments)
  22   - [Step 4: If Fn Do](#step-4-if-fn-do)
  23   - [Step 5: Tail call optimization](#step-5-tail-call-optimization)
  24   - [Step 6: Files, Mutation, and Evil](#step-6-files-mutation-and-evil)
  25   - [Step 7: Quoting](#step-7-quoting)
  26   - [Step 8: Macros](#step-8-macros)
  27   - [Step 9: Try](#step-9-try)
  28   - [Step A: Metadata, Self-hosting and Interop](#step-a-metadata-self-hosting-and-interop)
  29
  30
  31 ## Pick a language
  32
  33 You might already have a language in mind that you want to use.
  34 Technically speaking, mal can be implemented in any sufficiently
  35 complete programming language (i.e. Turing complete), however, there
  36 are a few language features that can make the task MUCH easier. Here
  37 are some of them in rough order of importance:
  38
  39 * A sequential compound data structure (e.g. arrays, lists,
  40   vectors, etc)
  41 * An associative compound data structure (e.g. a dictionary,
  42   hash-map, associative array, etc)
  43 * Function references (first class functions, function pointers,
  44   etc)
  45 * Real exception handling (try/catch, raise, throw, etc)
  46 * Variable argument functions (variadic, var args, splats, apply, etc)
  47 * Function closures
  48 * PCRE regular expressions
  49
  50 In addition, the following will make your task especially easy:
  51
  52 * Dynamic typing / boxed types (specifically, the ability to store
  53   different data types in the sequential and associative structures
  54   and the language keeps track of the type for you)
  55 * Compound data types support arbitrary runtime "hidden" data
  56   (metadata, metatables, dynamic fields attributes)
  57
  58 Here are some examples of languages that have all of the above
  59 features: JavaScript, Ruby, Python, Lua, R, Clojure.
  60
  61 Michael Fogus has some great blog posts on interesting but less well
  62 known languages and many of the languages on his lists do not yet have
  63 any mal implementations:
  64 * http://blog.fogus.me/2011/08/14/perlis-languages/
  65 * http://blog.fogus.me/2011/10/18/programming-language-development-the-past-5-years/
  66
  67 Many of the most popular languages already have Mal implementations.
  68 However, this should not discourage you from creating your own
  69 implementation in a language that already has one. However, if you go
  70 this route, I suggest you avoid referring to the existing
  71 implementations (i.e. "cheating") to maximize your learning experience
  72 instead of just borrowing mine. On the other hand, if your goal is to
  73 add new implementations to mal as efficiently as possible, then you
  74 SHOULD find the most similar target language implementation and refer
  75 to it frequently.
  76
  77 If you want a list of programming languages with an
  78 approximate measure of popularity try the [RedMonk Programming
  79 Language
  80 Rankings](https://redmonk.com/sogrady/2019/03/20/language-rankings-1-19/)
  81 or the [GitHut 2.0 Project](https://madnight.github.io/githut).
  82
  83
  84 ## Getting started
  85
  86 * Install your chosen language interpreter/compiler, language package
  87   manager and build tools (if applicable)
  88
  89 * Fork the mal repository on github and then clone your forked
  90   repository:
  91 ```
  92 git clone git@github.com:YOUR_NAME/mal.git
  93 cd mal
  94 ```
  95
  96 * Make a new directory for your implementation. For example, if your
  97 language is called "quux":
  98 ```
  99 mkdir quux
 100 ```
 101
 102 * Modify the top level Makefile to allow the tests to be run against
 103   your implementation. For example, if your language is named "quux"
 104   and uses "qx" as the file extension, then make the following
 105   3 modifications to Makefile:
 106 ```
 107 IMPLS = ... quux ...
 108 ...
 109 quux_STEP_TO_PROG = mylang/$($(1)).qx
 110 ```
 111
 112 * Add a "run" script to you implementation directory that listens to
 113   the "STEP" environment variable for the implementation step to run
 114   and defaults to "stepA_mal". Make sure the run script has the
 115   executable file permission set (or else the test runner might fail with a
 116   permission denied error message). The following are examples of "run"
 117   scripts for a compiled language and an interpreted language (where
 118   the interpreter is named "quux"):
 119
 120 ```
 121 #!/bin/bash
 122 exec $(dirname $0)/${STEP:-stepA_mal} "${@}"
 123 ```
 124
 125 ```
 126 #!/bin/bash
 127 exec quux $(dirname $0)/${STEP:-stepA_mal}.qx "${@}"
 128 ```
 129
 130 This allows you to run tests against your implementation like this:
 131 ```
 132 make "test^quux^stepX"
 133 ```
 134
 135 If your implementation language is a compiled language, then you
 136 should also add a Makefile at the top level of your implementation
 137 directory. This Makefile will define how to build the files pointed to
 138 by the quux_STEP_TO_PROG macro. The top-level Makefile will attempt to
 139 build those targets before running tests. If it is a scripting
 140 language/uncompiled, then no Makefile is necessary because
 141 quux_STEP_TO_PROG will point to a source file that already exists and
 142 does not need to be compiled/built.
 143
 144
 145 ## General hints
 146
 147 Stackoverflow and Google are your best friends. Modern polyglot
 148 developers do not memorize dozens of programming languages. Instead,
 149 they learn the peculiar terminology used with each language and then
 150 use this to search for their answers.
 151
 152 Here are some other resources where multiple languages are
 153 compared/described:
 154 * http://learnxinyminutes.com/
 155 * http://hyperpolyglot.org/
 156 * http://rosettacode.org/
 157 * http://rigaux.org/language-study/syntax-across-languages/
 158
 159 Do not let yourself be bogged down by specific problems. While the
 160 make-a-lisp process is structured as a series of steps, the reality is
 161 that building a lisp interpreter is more like a branching tree. If you
 162 get stuck on tail call optimization, or hash-maps, move on to other
 163 things. You will often have a stroke of inspiration for a problem as
 164 you work through other functionality. I have tried to structure this
 165 guide and the tests to make clear which things can be deferred until
 166 later.
 167
 168 An aside on deferrable/optional bits: when you run the tests for
 169 a given step, the last tests are often marked with an "optional"
 170 header. This indicates that these are tests for functionality that is
 171 not critical to finish a basic mal implementation. Many of the steps
 172 in this process guide have a "Deferrable" section, however, it is not
 173 quite the same meaning. Those sections include the functionality that
 174 is marked as optional in the tests, but they also include
 175 functionality that becomes mandatory at a later step. In other words,
 176 this is a "make your own Lisp adventure".
 177
 178 Use test driven development. Each step of the make-a-lisp process has
 179 a bunch of tests associated with it and there is an easy script to run
 180 all the tests for a specific step in the process. Pick a failing test,
 181 fix it, repeat until all the tests for that step pass.
 182
 183 ## Reference Code
 184
 185 The `process` directory contains abbreviated pseudocode and
 186 architecture diagrams for each step of the make-a-lisp process. Use
 187 a textual diff/comparison tool to compare the previous pseudocode step
 188 with the one you are working on. The architecture diagram images have
 189 changes from the previous step highlighted in red. There is also
 190 a concise
 191 [cheatsheet](http://kanaka.github.io/mal/cheatsheet.html) that
 192 summarizes the key changes at each step.
 193
 194 If you get completely stuck and are feeling like giving up, then you
 195 should "cheat" by referring to the same step or functionality in
 196 a existing implementation language. You are here to learn, not to take
 197 a test, so do not feel bad about it. Okay, you should feel a little
 198 bit bad about it.
 199
 200
 201 ## The Make-A-Lisp Process
 202
 203 Feel free to follow the guide as literally or as loosely as you
 204 like. You are here to learn; wandering off the beaten path may be the
 205 way you learn best. However, each step builds on the previous steps,
 206 so if you are new to Lisp or new to your implementation language then
 207 you may want to stick more closely to the guide your first time
 208 through to avoid frustration at later steps.
 209
 210 In the steps that follow the name of the target language is "quux" and
 211 the file extension for that language is "qx".
 212
 213
 214 <a name="step0"></a>
 215
 216 ### Step 0: The REPL
 217
 218 ![step0_repl architecture](step0_repl.png)
 219
 220 This step is basically just creating a skeleton of your interpreter.
 221
 222 * Create a `step0_repl.qx` file in `quux/`.
 223
 224 * Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep`
 225   (read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just
 226   stubs that return their first parameter (a string if your target
 227   language is a statically typed) and `rep` calls them in order
 228   passing the return to the input of the next.
 229
 230 * Add a main loop that repeatedly prints a prompt (needs to be
 231   "user> " for later tests to pass), gets a line of input from the
 232   user, calls `rep` with that line of input, and then prints out the
 233   result from `rep`. It should also exit when you send it an EOF
 234   (often Ctrl-D).
 235
 236 * If you are using a compiled (ahead-of-time rather than just-in-time)
 237   language, then create a Makefile (or appropriate project definition
 238   file) in your directory.
 239
 240 It is time to run your first tests. This will check that your program
 241 does input and output in a way that can be captured by the test
 242 harness. Go to the top level and run the following:
 243 ```
 244 make "test^quux^step0"
 245 ```
 246
 247 Add and then commit your new `step0_repl.qx` and `Makefile` to git.
 248
 249 Congratulations! You have just completed the first step of the
 250 make-a-lisp process.
 251
 252
 253 #### Optional:
 254
 255 * Add full line editing and command history support to your
 256   interpreter REPL. Many languages have a library/module that provide
 257   line editing support. Another option if your language supports it is
 258   to use an FFI (foreign function interface) to load and call directly
 259   into GNU readline, editline, or linenoise library. Add line
 260   editing interface code to `readline.qx`
 261
 262
 263 <a name="step1"></a>
 264
 265 ### Step 1: Read and Print
 266
 267 ![step1_read_print architecture](step1_read_print.png)
 268
 269 In this step, your interpreter will "read" the string from the user
 270 and parse it into an internal tree data structure (an abstract syntax
 271 tree) and then take that data structure and "print" it back to
 272 a string.
 273
 274 In non-lisp languages, this step (called "lexing and parsing") can be
 275 one of the most complicated parts of the compiler/interpreter. In
 276 Lisp, the data structure that you want in memory is basically
 277 represented directly in the code that the programmer writes
 278 (homoiconicity).
 279
 280 For example, if the string is "(+ 2 (* 3 4))" then the read function
 281 will process this into a tree structure that looks like this:
 282 ```
 283           List
 284          / |  \
 285         /  |   \
 286        /   |    \
 287   Sym:+  Int:2  List
 288                / |  \
 289               /  |   \
 290              /   |    \
 291          Sym:*  Int:3  Int:4
 292 ```
 293
 294 Each left paren and its matching right paren (lisp "sexpr") becomes
 295 a node in the tree and everything else becomes a leaf in the tree.
 296
 297 If you can find code for an implementation of a JSON encoder/decoder
 298 in your target language then you can probably just borrow and modify
 299 that and be 75% of the way done with this step.
 300
 301 The rest of this section is going to assume that you are not starting
 302 from an existing JSON encoder/decoder, but that you do have access to
 303 a Perl compatible regular expressions (PCRE) module/library. You can
 304 certainly implement the reader using simple string operations, but it
 305 is more involved. The `make`, `ps` (postscript) and Haskell
 306 implementations have examples of a reader/parser without using regular
 307 expression support.
 308
 309 * Copy `step0_repl.qx` to `step1_read_print.qx`.
 310
 311 * Add a `reader.qx` file to hold functions related to the reader.
 312
 313 * If the target language has objects types (OOP), then the next step
 314   is to create a simple stateful Reader object in `reader.qx`. This
 315   object will store the tokens and a position. The Reader object will
 316   have two methods: `next` and `peek`. `next` returns the token at
 317   the current position and increments the position. `peek` just
 318   returns the token at the current position.
 319
 320 * Add a function `read_str` in `reader.qx`. This function
 321   will call `tokenize` and then create a new Reader object instance
 322   with the tokens. Then it will call `read_form` with the Reader
 323   instance.
 324
 325 * Add a function `tokenize` in `reader.qx`. This function will take
 326   a single string and return an array/list
 327   of all the tokens (strings) in it. The following regular expression
 328   (PCRE) will match all mal tokens.
 329 ```
 330 [\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"?|;.*|[^\s\[\]{}('"`,;)]*)
 331 ```
 332 * For each match captured within the parenthesis starting at char 6 of the
 333   regular expression a new token will be created.
 334
 335   * `[\s,]*`: Matches any number of whitespaces or commas. This is not captured
 336     so it will be ignored and not tokenized.
 337
 338   * `~@`: Captures the special two-characters `~@` (tokenized).
 339
 340   * ```[\[\]{}()'`~^@]```: Captures any special single character, one of
 341     ```[]{}()'`~^@``` (tokenized).
 342
 343   * `"(?:\\.|[^\\"])*"?`: Starts capturing at a double-quote and stops at the
 344     next double-quote unless it was preceded by a backslash in which case it
 345     includes it until the next double-quote (tokenized). It will also
 346     match unbalanced strings (no ending double-quote) which should be
 347     reported as an error.
 348
 349   * `;.*`: Captures any sequence of characters starting with `;` (tokenized).
 350
 351   * ```[^\s\[\]{}('"`,;)]*```: Captures a sequence of zero or more non special
 352     characters (e.g. symbols, numbers, "true", "false", and "nil") and is sort
 353     of the inverse of the one above that captures special characters (tokenized).
 354
 355 * Add the function `read_form` to `reader.qx`. This function
 356   will peek at the first token in the Reader object and switch on the
 357   first character of that token. If the character is a left paren then
 358   `read_list` is called with the Reader object. Otherwise, `read_atom`
 359   is called with the Reader Object. The return value from `read_form`
 360   is a mal data type. If your target language is statically typed then
 361   you will need some way for `read_form` to return a variant or
 362   subclass type. For example, if your language is object oriented,
 363   then you can define a top level MalType (in `types.qx`) that all
 364   your mal data types inherit from. The MalList type (which also
 365   inherits from MalType) will contain a list/array of other MalTypes.
 366   If your language is dynamically typed then you can likely just
 367   return a plain list/array of other mal types.
 368
 369 * Add the function `read_list` to `reader.qx`. This function will
 370   repeatedly call `read_form` with the Reader object until it
 371   encounters a ')' token (if it reach EOF before reading a ')' then
 372   that is an error). It accumulates the results into a List type.  If
 373   your language does not have a sequential data type that can hold mal
 374   type values you may need to implement one (in `types.qx`).  Note
 375   that `read_list` repeatedly calls `read_form` rather than
 376   `read_atom`. This mutually recursive definition between `read_list`
 377   and `read_form` is what allows lists to contain lists.
 378
 379 * Add the function `read_atom` to `reader.qx`. This function will
 380   look at the contents of the token and return the appropriate scalar
 381   (simple/single) data type value. Initially, you can just implement
 382   numbers (integers) and symbols. This will allow you to proceed
 383   through the next couple of steps before you will need to implement
 384   the other fundamental mal types: nil, true, false, and string. The
 385   remaining scalar mal type, keyword does not
 386   need to be implemented until step A (but can be implemented at any
 387   point between this step and that). BTW, symbols types are just an
 388   object that contains a single string name value (some languages have
 389   symbol types already).
 390
 391 * Add a file `printer.qx`. This file will contain a single function
 392   `pr_str` which does the opposite of `read_str`: take a mal data
 393   structure and return a string representation of it. But `pr_str` is
 394   much simpler and is basically just a switch statement on the type of
 395   the input object:
 396
 397   * symbol: return the string name of the symbol
 398   * number: return the number as a string
 399   * list: iterate through each element of the list calling `pr_str` on
 400     it, then join the results with a space separator, and surround the
 401     final result with parens
 402
 403 * Change the `READ` function in `step1_read_print.qx` to call
 404   `reader.read_str` and the `PRINT` function to call `printer.pr_str`.
 405   `EVAL` continues to simply return its input but the type is now
 406   a mal data type.
 407
 408 You now have enough hooked up to begin testing your code. You can
 409 manually try some simple inputs:
 410   * `123` -> `123`
 411   * `   123  ` -> `123`
 412   * `abc` -> `abc`
 413   * `   abc   ` -> `abc`
 414   * `(123 456)` -> `(123 456)`
 415   * `(  123   456 789   )   ` -> `(123 456 789)`
 416   * `(  + 2   (*  3  4)  )  ` -> `(+ 2 (* 3 4))`
 417
 418 To verify that your code is doing more than just eliminating extra
 419 spaces (and not failing), you can instrument your `reader.qx` functions.
 420
 421 Once you have gotten past those simple manual tests, it is time to run
 422 the full suite of step 1 tests. Go to the top level and run the
 423 following:
 424 ```
 425 make "test^quux^step1"
 426 ```
 427
 428 Fix any test failures related to symbols, numbers and lists.
 429
 430 Depending on the functionality of your target language, it is likely
 431 that you have now just completed one of the most difficult steps. It
 432 is down hill from here. The remaining steps will probably be easier
 433 and each step will give progressively more bang for the buck.
 434
 435 #### Deferrable:
 436
 437
 438 * Add support for the other basic data type to your reader and printer
 439   functions: string, nil, true, and false. Nil, true, and false
 440   become mandatory at step 4, strings at step 6. When a string is read,
 441   the following transformations are
 442   applied: a backslash followed by a doublequote is translated into
 443   a plain doublequote character, a backslash followed by "n" is
 444   translated into a newline, and a backslash followed by another
 445   backslash is translated into a single backslash. To properly print
 446   a string (for step 4 string functions), the `pr_str` function needs
 447   another parameter called `print_readably`.  When `print_readably` is
 448   true, doublequotes, newlines, and backslashes are translated into
 449   their printed representations (the reverse of the reader). The
 450   `PRINT` function in the main program should call `pr_str` with
 451   print_readably set to true.
 452
 453 * Add error checking to your reader functions to make sure parens
 454   are properly matched. Catch and print these errors in your main
 455   loop. If your language does not have try/catch style bubble up
 456   exception handling, then you will need to add explicit error
 457   handling to your code to catch and pass on errors without crashing.
 458
 459 * Add support for reader macros which are forms that are
 460   transformed into other forms during the read phase. Refer to
 461   `tests/step1_read_print.mal` for the form that these macros should
 462   take (they are just simple transformations of the token stream).
 463
 464 * Add support for the other mal types: keyword, vector, hash-map.
 465   * keyword: a keyword is a token that begins with a colon. A keyword
 466     can just be stored as a string with special unicode prefix like
 467     0x29E (or char 0xff/127 if the target language does not have good
 468     unicode support) and the printer translates strings with that
 469     prefix back to the keyword representation. This makes it easy to
 470     use keywords as hash map keys in most languages. You can also
 471     store keywords as a unique data type, but you will need to make
 472     sure they can be used as hash map keys (which may involve doing
 473     a similar prefixed translation anyways).
 474   * vector: a vector can be implemented with same underlying
 475     type as a list as long as there is some mechanism to keep track of
 476     the difference. You can use the same reader function for both
 477     lists and vectors by adding parameters for the starting and ending
 478     tokens.
 479   * hash-map: a hash-map is an associative data structure that maps
 480     strings to other mal values. If you implement keywords as prefixed
 481     strings, then you only need a native associative data structure
 482     which supports string keys. Clojure allows any value to be a hash
 483     map key, but the base functionality in mal is to support strings
 484     and keyword keys. Because of the representation of hash-maps as
 485     an alternating sequence of keys and values, you can probably use
 486     the same reader function for hash-maps as lists and vectors with
 487     parameters to indicate the starting and ending tokens. The odd
 488     tokens are then used for keys with the corresponding even tokens
 489     as the values.
 490
 491 * Add comment support to your reader. The tokenizer should ignore
 492   tokens that start with ";". Your `read_str` function will need to
 493   properly handle when the tokenizer returns no values. The simplest
 494   way to do this is to return `nil` mal value. A cleaner option (that
 495   does not print `nil` at the prompt is to throw a special exception
 496   that causes the main loop to simply continue at the beginning of the
 497   loop without calling `rep`.
 498
 499
 500 <a name="step2"></a>
 501
 502 ### Step 2: Eval
 503
 504 ![step2_eval architecture](step2_eval.png)
 505
 506 In step 1 your mal interpreter was basically just a way to validate
 507 input and eliminate extraneous white space. In this step you will turn
 508 your interpreter into a simple number calculator by adding
 509 functionality to the evaluator (`EVAL`).
 510
 511 Compare the pseudocode for step 1 and step 2 to get a basic idea of
 512 the changes that will be made during this step:
 513 ```
 514 diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt
 515 ```
 516
 517 * Copy `step1_read_print.qx` to `step2_eval.qx`.
 518
 519 * Define a simple initial REPL environment. This environment is an
 520   associative structure that maps symbols (or symbol names) to
 521   numeric functions. For example, in python this would look something
 522   like this:
 523 ```
 524 repl_env = {'+': lambda a,b: a+b,
 525             '-': lambda a,b: a-b,
 526             '*': lambda a,b: a*b,
 527             '/': lambda a,b: int(a/b)}
 528 ```
 529
 530 * Modify the `rep` function to pass the REPL environment as the second
 531   parameter for the `EVAL` call.
 532
 533 * Create a new function `eval_ast` which takes `ast` (mal data type)
 534   and an associative structure (the environment from above).
 535   `eval_ast` switches on the type of `ast` as follows:
 536
 537   * symbol: lookup the symbol in the environment structure and return
 538     the value or raise an error if no value is found
 539   * list: return a new list that is the result of calling `EVAL` on
 540     each of the members of the list
 541   * otherwise just return the original `ast` value
 542
 543 * Modify `EVAL` to check if the first parameter `ast` is a list.
 544   * `ast` is not a list: then return the result of calling `eval_ast`
 545     on it.
 546   * `ast` is a empty list: return ast unchanged.
 547   * `ast` is a list: call `eval_ast` to get a new evaluated list. Take
 548     the first item of the evaluated list and call it as function using
 549     the rest of the evaluated list as its arguments.
 550
 551 If your target language does not have full variable length argument
 552 support (e.g. variadic, vararg, splats, apply) then you will need to
 553 pass the full list of arguments as a single parameter and split apart
 554 the individual values inside of every mal function. This is annoying,
 555 but workable.
 556
 557 The process of taking a list and invoking or executing it to return
 558 something new is known in Lisp as the "apply" phase.
 559
 560 Try some simple expressions:
 561
 562   * `(+ 2 3)` -> `5`
 563   * `(+ 2 (* 3 4))` -> `14`
 564
 565 The most likely challenge you will encounter is how to properly call
 566 a function references using an arguments list.
 567
 568 Now go to the top level, run the step 2 tests and fix the errors.
 569 ```
 570 make "test^quux^step2"
 571 ```
 572
 573 You now have a simple prefix notation calculator!
 574
 575 #### Deferrable:
 576
 577 * `eval_ast` should evaluate elements of vectors and hash-maps.  Add the
 578   following cases in `eval_ast`:
 579   * If `ast` is a vector: return a new vector that is the result of calling
 580     `EVAL` on each of the members of the vector.
 581   * If `ast` is a hash-map: return a new hash-map which consists of key-value
 582     pairs where the key is a key from the hash-map and the value is the result
 583     of calling `EVAL` on the corresponding value.
 584
 585
 586 <a name="step3"></a>
 587
 588 ### Step 3: Environments
 589
 590 ![step3_env architecture](step3_env.png)
 591
 592 In step 2 you were already introduced to REPL environment (`repl_env`)
 593 where the basic numeric functions were stored and looked up. In this
 594 step you will add the ability to create new environments (`let*`) and
 595 modify existing environments (`def!`).
 596
 597 A Lisp environment is an associative data structure that maps symbols (the
 598 keys) to values. But Lisp environments have an additional important
 599 function: they can refer to another environment (the outer
 600 environment). During environment lookups, if the current environment
 601 does not have the symbol, the lookup continues in the outer
 602 environment, and continues this way until the symbol is either found,
 603 or the outer environment is `nil` (the outermost environment in the
 604 chain).
 605
 606 Compare the pseudocode for step 2 and step 3 to get a basic idea of
 607 the changes that will be made during this step:
 608 ```
 609 diff -urp ../process/step2_eval.txt ../process/step3_env.txt
 610 ```
 611
 612 * Copy `step2_eval.qx` to `step3_env.qx`.
 613
 614 * Create `env.qx` to hold the environment definition.
 615
 616 * Define an `Env` object that is instantiated with a single `outer`
 617   parameter and starts with an empty associative data structure
 618   property `data`.
 619
 620 * Define three methods for the Env object:
 621   * set: takes a symbol key and a mal value and adds to the `data`
 622     structure
 623   * find: takes a symbol key and if the current environment contains
 624     that key then return the environment. If no key is found and outer
 625     is not `nil` then call find (recurse) on the outer environment.
 626   * get: takes a symbol key and uses the `find` method to locate the
 627     environment with the key, then returns the matching value. If no
 628     key is found up the outer chain, then throws/raises a "not found"
 629     error.
 630
 631 * Update `step3_env.qx` to use the new `Env` type to create the
 632   repl_env (with a `nil` outer value) and use the `set` method to add
 633   the numeric functions.
 634
 635 * Modify `eval_ast` to call the `get` method on the `env` parameter.
 636
 637 * Modify the apply section of `EVAL` to switch on the first element of
 638   the list:
 639   * symbol "def!": call the set method of the current environment
 640     (second parameter of `EVAL` called `env`) using the unevaluated
 641     first parameter (second list element) as the symbol key and the
 642     evaluated second parameter as the value.
 643   * symbol "let\*": create a new environment using the current
 644     environment as the outer value and then use the first parameter as
 645     a list of new bindings in the "let\*" environment. Take the second
 646     element of the binding list, call `EVAL` using the new "let\*"
 647     environment as the evaluation environment, then call `set` on the
 648     "let\*" environment using the first binding list element as the key
 649     and the evaluated second element as the value. This is repeated
 650     for each odd/even pair in the binding list. Note in particular,
 651     the bindings earlier in the list can be referred to by later
 652     bindings. Finally, the second parameter (third element) of the
 653     original `let*` form is evaluated using the new "let\*" environment
 654     and the result is returned as the result of the `let*` (the new
 655     let environment is discarded upon completion).
 656   * otherwise: call `eval_ast` on the list and apply the first element
 657     to the rest as before.
 658
 659 `def!` and `let*` are Lisp "specials" (or "special atoms") which means
 660 that they are language level features and more specifically that the
 661 rest of the list elements (arguments) may be evaluated differently (or
 662 not at all) unlike the default apply case where all elements of the
 663 list are evaluated before the first element is invoked. Lists which
 664 contain a "special" as the first element are known as "special forms".
 665 They are special because they follow special evaluation rules.
 666
 667 Try some simple environment tests:
 668
 669   * `(def! a 6)` -> `6`
 670   * `a` -> `6`
 671   * `(def! b (+ a 2))` -> `8`
 672   * `(+ a b)` -> `14`
 673   * `(let* (c 2) c)` -> `2`
 674
 675 Now go to the top level, run the step 3 tests and fix the errors.
 676 ```
 677 make "test^quux^step3"
 678 ```
 679
 680 You mal implementation is still basically just a numeric calculator
 681 with save/restore capability. But you have set the foundation for step
 682 4 where it will begin to feel like a real programming language.
 683
 684
 685 An aside on mutation and typing:
 686
 687 The "!" suffix on symbols is used to indicate that this symbol refers
 688 to a function that mutates something else. In this case, the `def!`
 689 symbol indicates a special form that will mutate the current
 690 environment. Many (maybe even most) of runtime problems that are
 691 encountered in software engineering are a result of mutation. By
 692 clearly marking code where mutation may occur, you can more easily
 693 track down the likely cause of runtime problems when they do occur.
 694
 695 Another cause of runtime errors is type errors, where a value of one
 696 type is unexpectedly treated by the program as a different and
 697 incompatible type. Statically typed languages try to make the
 698 programmer solve all type problems before the program is allowed to
 699 run. Most Lisp variants tend to be dynamically typed (types of values
 700 are checked when they are actually used at runtime).
 701
 702 As an aside-aside: The great debate between static and dynamic typing
 703 can be understood by following the money. Advocates of strict static
 704 typing use words like "correctness" and "safety" and thus get
 705 government and academic funding. Advocates of dynamic typing use words
 706 like "agile" and "time-to-market" and thus get venture capital and
 707 commercial funding.
 708
 709
 710 <a name="step4"></a>
 711
 712 ### Step 4: If Fn Do
 713
 714 ![step4_if_fn_do architecture](step4_if_fn_do.png)
 715
 716 In step 3 you added environments and the special forms for
 717 manipulating environments. In this step you will add 3 new special
 718 forms (`if`, `fn*` and `do`) and add several more core functions to
 719 the default REPL environment. Our new architecture will look like
 720 this:
 721
 722 The `fn*` special form is how new user-defined functions are created.
 723 In some Lisps, this special form is named "lambda".
 724
 725 Compare the pseudocode for step 3 and step 4 to get a basic idea of
 726 the changes that will be made during this step:
 727 ```
 728 diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt
 729 ```
 730
 731 * Copy `step3_env.qx` to `step4_if_fn_do.qx`.
 732
 733 * If you have not implemented reader and printer support (and data
 734   types) for `nil`, `true` and `false`, you will need to do so for
 735   this step.
 736
 737 * Update the constructor/initializer for environments to take two new
 738   arguments: `binds` and `exprs`. Bind (`set`) each element (symbol)
 739   of the binds list to the respective element of the `exprs` list.
 740
 741 * Add support to `printer.qx` to print functions values. A string
 742   literal like "#\<function>" is sufficient.
 743
 744 * Add the following special forms to `EVAL`:
 745
 746   * `do`: Evaluate all the elements of the list using `eval_ast`
 747     and return the final evaluated element.
 748   * `if`: Evaluate the first parameter (second element). If the result
 749     (condition) is anything other than `nil` or `false`, then evaluate
 750     the second parameter (third element of the list) and return the
 751     result.  Otherwise, evaluate the third parameter (fourth element)
 752     and return the result. If condition is false and there is no third
 753     parameter, then just return `nil`.
 754   * `fn*`: Return a new function closure. The body of that closure
 755     does the following:
 756     * Create a new environment using `env` (closed over from outer
 757       scope) as the `outer` parameter, the first parameter (second
 758       list element of `ast` from the outer scope) as the `binds`
 759       parameter, and the parameters to the closure as the `exprs`
 760       parameter.
 761     * Call `EVAL` on the second parameter (third list element of `ast`
 762       from outer scope), using the new environment. Use the result as
 763       the return value of the closure.
 764
 765 If your target language does not support closures, then you will need
 766 to implement `fn*` using some sort of structure or object that stores
 767 the values being closed over: the first and second elements of the
 768 `ast` list (function parameter list and function body) and the current
 769 environment `env`. In this case, your native functions will need to be
 770 wrapped in the same way. You will probably also need a method/function
 771 that invokes your function object/structure for the default case of
 772 the apply section of `EVAL`.
 773
 774 Try out the basic functionality you have implemented:
 775
 776   * `(fn* (a) a)` -> `#<function>`
 777   * `( (fn* (a) a) 7)` -> `7`
 778   * `( (fn* (a) (+ a 1)) 10)` -> `11`
 779   * `( (fn* (a b) (+ a b)) 2 3)` -> `5`
 780
 781 * Add a new file `core.qx` and define an associative data structure
 782   `ns` (namespace) that maps symbols to functions. Move the numeric
 783   function definitions into this structure.
 784
 785 * Modify `step4_if_fn_do.qx` to iterate through the `core.ns`
 786   structure and add (`set`) each symbol/function mapping to the
 787   REPL environment (`repl_env`).
 788
 789 * Add the following functions to `core.ns`:
 790   * `prn`: call `pr_str` on the first parameter with `print_readably`
 791     set to true, prints the result to the screen and then return
 792     `nil`. Note that the full version of `prn` is a deferrable below.
 793   * `list`: take the parameters and return them as a list.
 794   * `list?`: return true if the first parameter is a list, false
 795     otherwise.
 796   * `empty?`: treat the first parameter as a list and return true if
 797     the list is empty and false if it contains any elements.
 798   * `count`: treat the first parameter as a list and return the number
 799     of elements that it contains.
 800   * `=`: compare the first two parameters and return true if they are
 801     the same type and contain the same value. In the case of equal
 802     length lists, each element of the list should be compared for
 803     equality and if they are the same return true, otherwise false.
 804   * `<`, `<=`, `>`, and `>=`: treat the first two parameters as
 805     numbers and do the corresponding numeric comparison, returning
 806     either true or false.
 807
 808 Now go to the top level, run the step 4 tests. There are a lot of
 809 tests in step 4 but all of the non-optional tests that do not involve
 810 strings should be able to pass now.
 811
 812 ```
 813 make "test^quux^step4"
 814 ```
 815
 816 Your mal implementation is already beginning to look like a real
 817 language. You have flow control, conditionals, user-defined functions
 818 with lexical scope, side-effects (if you implement the string
 819 functions), etc. However, our little interpreter has not quite reached
 820 Lisp-ness yet. The next several steps will take your implementation
 821 from a neat toy to a full featured language.
 822
 823 #### Deferrable:
 824
 825 * Implement Clojure-style variadic function parameters. Modify the
 826   constructor/initializer for environments, so that if a "&" symbol is
 827   encountered in the `binds` list, the next symbol in the `binds` list
 828   after the "&" is bound to the rest of the `exprs` list that has not
 829   been bound yet.
 830
 831 * Define a `not` function using mal itself. In `step4_if_fn_do.qx`
 832   call the `rep` function with this string:
 833   "(def! not (fn* (a) (if a false true)))".
 834
 835 * Implement the strings functions in `core.qx`. To implement these
 836   functions, you will need to implement the string support in the
 837   reader and printer (deferrable section of step 1). Each of the string
 838   functions takes multiple mal values, prints them (`pr_str`) and
 839   joins them together into a new string.
 840   * `pr-str`: calls `pr_str` on each argument with `print_readably`
 841     set to true, joins the results with " " and returns the new
 842     string.
 843   * `str`: calls `pr_str` on each argument with `print_readably` set
 844     to false, concatenates the results together ("" separator), and
 845     returns the new string.
 846   * `prn`:  calls `pr_str` on each argument with `print_readably` set
 847     to true, joins the results with " ", prints the string to the
 848     screen and then returns `nil`.
 849   * `println`:  calls `pr_str` on each argument with `print_readably` set
 850     to false, joins the results with " ", prints the string to the
 851     screen and then returns `nil`.
 852
 853
 854 <a name="step5"></a>
 855
 856 ### Step 5: Tail call optimization
 857
 858 ![step5_tco architecture](step5_tco.png)
 859
 860 In step 4 you added special forms `do`, `if` and `fn*` and you defined
 861 some core functions. In this step you will add a Lisp feature called
 862 tail call optimization (TCO). Also called "tail recursion" or
 863 sometimes just "tail calls".
 864
 865 Several of the special forms that you have defined in `EVAL` end up
 866 calling back into `EVAL`. For those forms that call `EVAL` as the last
 867 thing that they do before returning (tail call) you will just loop back
 868 to the beginning of eval rather than calling it again. The advantage
 869 of this approach is that it avoids adding more frames to the call
 870 stack. This is especially important in Lisp languages because they tend
 871 to prefer using recursion instead of iteration for control structures.
 872 (Though some Lisps, such as Common Lisp, have iteration.) However, with
 873 tail call optimization, recursion can be made as stack efficient as
 874 iteration.
 875
 876 Compare the pseudocode for step 4 and step 5 to get a basic idea of
 877 the changes that will be made during this step:
 878 ```
 879 diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt
 880 ```
 881
 882 * Copy `step4_if_fn_do.qx` to `step5_tco.qx`.
 883
 884 * Add a loop (e.g. while true) around all code in `EVAL`.
 885
 886 * Modify each of the following form cases to add tail call recursion
 887   support:
 888   * `let*`: remove the final `EVAL` call on the second `ast` argument
 889     (third list element). Set `env` (i.e. the local variable passed in
 890     as second parameter of `EVAL`) to the new let environment. Set
 891     `ast` (i.e. the local variable passed in as first parameter of
 892     `EVAL`) to be the second `ast` argument. Continue at the beginning
 893     of the loop (no return).
 894   * `do`: change the `eval_ast` call to evaluate all the parameters
 895     except for the last (2nd list element up to but not including
 896     last). Set `ast` to the last element of `ast`. Continue
 897     at the beginning of the loop (`env` stays unchanged).
 898   * `if`: the condition continues to be evaluated, however, rather
 899     than evaluating the true or false branch, `ast` is set to the
 900     unevaluated value of the chosen branch. Continue at the beginning
 901     of the loop (`env` is unchanged).
 902
 903 * The return value from the `fn*` special form will now become an
 904   object/structure with attributes that allow the default invoke case
 905   of `EVAL` to do TCO on mal functions. Those attributes are:
 906   * `ast`: the second `ast` argument (third list element) representing
 907     the body of the function.
 908   * `params`: the first `ast` argument (second list element)
 909     representing the parameter names of the function.
 910   * `env`: the current value of the `env` parameter of `EVAL`.
 911   * `fn`: the original function value (i.e. what was return by `fn*`
 912     in step 4). Note that this is deferrable until step 9 when it is
 913     required for the `map` and `apply` core functions). You will also
 914     need it in step 6 if you choose to not to defer atoms/`swap!` from
 915     that step.
 916
 917 * The default "apply"/invoke case of `EVAL` must now be changed to
 918   account for the new object/structure returned by the `fn*` form.
 919   Continue to call `eval_ast` on `ast`. The first element of the
 920   result of `eval_ast` is `f` and the remaining elements are in `args`.
 921   Switch on the type of `f`:
 922   * regular function (not one defined by `fn*`): apply/invoke it as
 923     before (in step 4).
 924   * a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate
 925     a new environment using the `env` and `params` attributes of `f`
 926     as the `outer` and `binds` arguments and `args` as the `exprs`
 927     argument. Set `env` to the new environment. Continue at the
 928     beginning of the loop.
 929
 930 Run some manual tests from previous steps to make sure you have not
 931 broken anything by adding TCO.
 932
 933 Now go to the top level, run the step 5 tests.
 934
 935 ```
 936 make "test^quux^step5"
 937 ```
 938
 939 Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to`
 940 function cannot be tail call optimized because it does something after
 941 the recursive call (`sum-to` calls itself and then does the addition).
 942 Lispers say that the `sum-to` is not in tail position. The `sum2`
 943 function however, calls itself from tail position. In other words, the
 944 recursive call to `sum2` is the last action that `sum2` does. Calling
 945 `sum-to` with a large value will cause a stack overflow exception in
 946 most target languages (some have super-special tricks they use to
 947 avoid stack overflows).
 948
 949 Congratulations, your mal implementation already has a feature (TCO)
 950 that most mainstream languages lack.
 951
 952
 953 <a name="step6"></a>
 954
 955 ### Step 6: Files, Mutation, and Evil
 956
 957 ![step6_file architecture](step6_file.png)
 958
 959 In step 5 you added tail call optimization. In this step you will add
 960 some string and file operations and give your implementation a touch
 961 of evil ... er, eval. And as long as your language supports function
 962 closures, this step will be quite simple. However, to complete this
 963 step, you must implement string type support, so if you have been
 964 holding off on that you will need to go back and do so.
 965
 966 Compare the pseudocode for step 5 and step 6 to get a basic idea of
 967 the changes that will be made during this step:
 968 ```
 969 diff -urp ../process/step5_tco.txt ../process/step6_file.txt
 970 ```
 971
 972 * Copy `step5_tco.qx` to `step6_file.qx`.
 973
 974 * Add two new string functions to the core namespaces:
 975   * `read-string`: this function just exposes the `read_str` function
 976     from the reader. If your mal string type is not the same as your
 977     target language (e.g. statically typed language) then your
 978     `read-string` function will need to unbox (extract) the raw string
 979     from the mal string type in order to call `read_str`.
 980   * `slurp`: this function takes a file name (string) and returns the
 981     contents of the file as a string. Once again, if your mal string
 982     type wraps a raw target language string, then you will need to
 983     unmarshall (extract) the string parameter to get the raw file name
 984     string and marshall (wrap) the result back to a mal string type.
 985
 986 * In your main program, add a new symbol "eval" to your REPL
 987   environment. The value of this new entry is a function that takes
 988   a single argument `ast`. The closure calls the your `EVAL` function
 989   using the `ast` as the first argument and the REPL environment
 990   (closed over from outside) as the second argument. The result of
 991   the `EVAL` call is returned. This simple but powerful addition
 992   allows your program to treat mal data as a mal program. For example,
 993   you can now to this:
 994 ```
 995 (def! mal-prog (list + 1 2))
 996 (eval mal-prog)
 997 ```
 998
 999 * Define a `load-file` function using mal itself. In your main
1000   program call the `rep` function with this string:
1001   "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \"\nnil)\")))))".
1002
1003 Try out `load-file`:
1004   * `(load-file "../tests/incA.mal")` -> `9`
1005   * `(inc4 3)` -> `7`
1006
1007 The `load-file` function does the following:
1008   * Call `slurp` to read in a file by name. Surround the contents with
1009     "(do ...)" so that the whole file will be treated as a single
1010     program AST (abstract syntax tree). Add a new line in case the files
1011     ends with a comment. The `nil` ensures a short and predictable result,
1012     instead of what happens to be the last function defined in the loaded file.
1013   * Call `read-string` on the string returned from `slurp`. This uses
1014     the reader to read/convert the file contents into mal data/AST.
1015   * Call `eval` (the one in the REPL environment) on the AST returned
1016     from `read-string` to "run" it.
1017
1018 Besides adding file and eval support, we'll add support for the atom data type
1019 in this step.  An atom is the Mal way to represent *state*; it is
1020 heavily inspired by [Clojure's atoms](http://clojure.org/state).  An atom holds
1021 a reference to a single Mal value of any type; it supports reading that Mal value
1022 and *modifying* the reference to point to another Mal value.  Note that this is
1023 the only Mal data type that is mutable (but the Mal values it refers to are
1024 still immutable; immutability is explained in greater detail in step 7).
1025 You'll need to add 5 functions to the core namespace to support atoms:
1026
1027   * `atom`: Takes a Mal value and returns a new atom which points to that Mal value.
1028   * `atom?`: Takes an argument and returns `true` if the argument is an atom.
1029   * `deref`: Takes an atom argument and returns the Mal value referenced by this atom.
1030   * `reset!`: Takes an atom and a Mal value; the atom is modified to refer to
1031     the given Mal value. The Mal value is returned.
1032   * `swap!`: Takes an atom, a function, and zero or more function arguments. The
1033     atom's value is modified to the result of applying the function with the atom's
1034     value as the first argument and the optionally given function arguments as
1035     the rest of the arguments. The new atom's value is returned. (Side note: Mal is
1036     single-threaded, but in concurrent languages like Clojure, `swap!` promises
1037     atomic update: `(swap! myatom (fn* [x] (+ 1 x)))` will always increase the
1038     `myatom` counter by one and will not suffer from missing updates when the
1039     atom is updated from multiple threads.)
1040
1041 Optionally, you can add a reader macro `@` which will serve as a short form for
1042 `deref`, so that `@a` is equivalent to `(deref a)`.  In order to do that, modify
1043 the conditional in reader `read_form` function and add a case which deals with
1044 the `@` token: if the token is `@` (at sign) then return a new list that
1045 contains the symbol `deref` and the result of reading the next form
1046 (`read_form`).
1047
1048 Now go to the top level, run the step 6 tests. The optional tests will
1049 need support from the reader for comments, vectors, hash-maps and the `@`
1050 reader macro:
1051 ```
1052 make "test^quux^step6"
1053 ```
1054
1055 Congratulations, you now have a full-fledged scripting language that
1056 can run other mal programs. The `slurp` function loads a file as
1057 a string, the `read-string` function calls the mal reader to turn that
1058 string into data, and the `eval` function takes data and evaluates it
1059 as a normal mal program. However, it is important to note that the
1060 `eval` function is not just for running external programs. Because mal
1061 programs are regular mal data structures, you can dynamically generate
1062 or manipulate those data structures before calling `eval` on them.
1063 This isomorphism (same shape) between data and programs is known as
1064 "homoiconicity". Lisp languages are homoiconic and this property
1065 distinguishes them from most other programming languages.
1066
1067 You mal implementation is quite powerful already but the set of
1068 functions that are available (from `core.qx`) is fairly limited. The
1069 bulk of the functions you will add are described in step 9 and step A,
1070 but you will begin to flesh them out over the next few steps to
1071 support quoting (step 7) and macros (step 8).
1072
1073
1074 #### Deferrable:
1075
1076 * Add the ability to run another mal program from the command line.
1077   Prior to the REPL loop, check if your mal implementation is called
1078   with command line arguments. If so, treat the first argument as
1079   a filename and use `rep` to call `load-file` on that filename, and
1080   finally exit/terminate execution.
1081
1082 * Add the rest of the command line arguments to your REPL environment
1083   so that programs that are run with `load-file` have access to their
1084   calling environment. Add a new "\*ARGV\*" (symbol) entry to your REPL
1085   environment. The value of this entry should be the rest of the
1086   command line arguments as a mal list value.
1087
1088
1089 <a name="step7"></a>
1090
1091 ### Step 7: Quoting
1092
1093 ![step7_quote architecture](step7_quote.png)
1094
1095 In step 7 you will add the special forms `quote` and `quasiquote` and
1096 add supporting core functions `cons` and `concat`. The two quote forms
1097 add a powerful abstraction for manipulating mal code itself
1098 (meta-programming).
1099
1100 The `quote` special form indicates to the evaluator (`EVAL`) that the
1101 parameter should not be evaluated (yet). At first glance, this might
1102 not seem particularly useful but an example of what this enables is the
1103 ability for a mal program to refer to a symbol itself rather than the
1104 value that it evaluates to. Likewise with lists. For example, consider
1105 the following:
1106
1107 * `(prn abc)`: this will lookup the symbol `abc` in the current
1108   evaluation environment and print it. This will result in error if
1109   `abc` is not defined.
1110 * `(prn (quote abc))`: this will print "abc" (prints the symbol
1111   itself). This will work regardless of whether `abc` is defined in
1112   the current environment.
1113 * `(prn (1 2 3))`: this will result in an error because `1` is not
1114   a function and cannot be applied to the arguments `(2 3)`.
1115 * `(prn (quote (1 2 3)))`: this will print "(1 2 3)".
1116 * `(def! l (quote (1 2 3)))`: list quoting allows us to define lists
1117   directly in the code (list literal). Another way of doing this is
1118   with the list function: `(def! l (list 1 2 3))`.
1119
1120 The second special quoting form is `quasiquote`. This allows a quoted
1121 list to have internal elements of the list that are temporarily
1122 unquoted (normal evaluation). There are two special forms that only
1123 mean something within a quasiquoted list: `unquote` and
1124 `splice-unquote`. These are perhaps best explained with some examples:
1125
1126 * `(def! lst (quote (2 3)))` -> `(2 3)`
1127 * `(quasiquote (1 (unquote lst)))` -> `(1 (2 3))`
1128 * `(quasiquote (1 (splice-unquote lst)))` -> `(1 2 3)`
1129
1130 The `unquote` form turns evaluation back on for its argument and the
1131 result of evaluation is put in place into the quasiquoted list. The
1132 `splice-unquote` also turns evaluation back on for its argument, but
1133 the evaluated value must be a list which is then "spliced" into the
1134 quasiquoted list. The true power of the quasiquote form will be
1135 manifest when it is used together with macros (in the next step).
1136
1137 Compare the pseudocode for step 6 and step 7 to get a basic idea of
1138 the changes that will be made during this step:
1139 ```
1140 diff -urp ../process/step6_file.txt ../process/step7_quote.txt
1141 ```
1142
1143 * Copy `step6_file.qx` to `step7_quote.qx`.
1144
1145 * Before implementing the quoting forms, you will need to implement
1146   some supporting functions in the core namespace:
1147   * `cons`: this function takes a list as its second
1148     parameter and returns a new list that has the first argument
1149     prepended to it.
1150   * `concat`: this functions takes 0 or more lists as
1151     parameters and returns a new list that is a concatenation of all
1152     the list parameters.
1153
1154 An aside on immutability: note that neither cons or concat mutate
1155 their original list arguments. Any references to them (i.e. other
1156 lists that they may be "contained" in) will still refer to the
1157 original unchanged value. Mal, like Clojure, is a language which uses
1158 immutable data structures. I encourage you to read about the power and
1159 importance of immutability as implemented in Clojure (from which
1160 Mal borrows most of its syntax and feature-set).
1161
1162 * Add the `quote` special form. This form just returns its argument
1163   (the second list element of `ast`).
1164
1165 * Add the `quasiquote` special form. First implement a helper function
1166   `is_pair` that returns true if the parameter is a non-empty list.
1167   Then define a `quasiquote` function. This is called from `EVAL` with
1168   the first `ast` argument (second list element) and then `ast` is set
1169   to the result and execution continues at the top of the loop (TCO).
1170   The `quasiquote` function takes a parameter `ast` and has the
1171   following conditional:
1172   1. if `is_pair` of `ast` is false: return a new list containing:
1173      a symbol named "quote" and `ast`.
1174   2. else if the first element of `ast` is a symbol named "unquote":
1175      return the second element of `ast`.
1176   3. if `is_pair` of the first element of `ast` is true and the first
1177      element of first element of `ast` (`ast[0][0]`) is a symbol named
1178      "splice-unquote": return a new list containing: a symbol named
1179      "concat", the second element of first element of `ast`
1180      (`ast[0][1]`), and the result of calling `quasiquote` with the
1181      second through last element of `ast`.
1182   4. otherwise: return a new list containing: a symbol named "cons", the
1183      result of calling `quasiquote` on first element of `ast`
1184      (`ast[0]`), and the result of calling `quasiquote` with the second
1185      through last element of `ast`.
1186
1187
1188 Now go to the top level, run the step 7 tests:
1189 ```
1190 make "test^quux^step7"
1191 ```
1192
1193 Quoting is one of the more mundane functions available in mal, but do
1194 not let that discourage you. Your mal implementation is almost
1195 complete, and quoting sets the stage for the next very exciting step:
1196 macros.
1197
1198
1199 #### Deferrable
1200
1201 * The full names for the quoting forms are fairly verbose. Most Lisp
1202   languages have a short-hand syntax and Mal is no exception. These
1203   short-hand syntaxes are known as reader macros because they allow us
1204   to manipulate mal code during the reader phase. Macros that run
1205   during the eval phase are just called "macros" and are described in
1206   the next section. Expand the conditional with reader `read_form`
1207   function to add the following four cases:
1208   * token is "'" (single quote): return a new list that contains the
1209     symbol "quote" and the result of reading the next form
1210     (`read_form`).
1211   * token is "\`" (back-tick): return a new list that contains the
1212     symbol "quasiquote" and the result of reading the next form
1213     (`read_form`).
1214   * token is "~" (tilde): return a new list that contains the
1215     symbol "unquote" and the result of reading the next form
1216     (`read_form`).
1217   * token is "~@" (tilde + at sign): return a new list that contains
1218     the symbol "splice-unquote" and the result of reading the next
1219     form (`read_form`).
1220
1221 * Add support for quoting of vectors. The `is_pair` function should
1222   return true if the argument is a non-empty list or vector. `cons`
1223   should also accept a vector as the second argument. The return value
1224   is a list regardless. `concat` should support concatenation of
1225   lists, vectors, or a mix or both. The result is always a list.
1226
1227
1228 <a name="step8"></a>
1229
1230 ### Step 8: Macros
1231
1232 ![step8_macros architecture](step8_macros.png)
1233
1234 Your mal implementation is now ready for one of the most lispy and
1235 exciting of all programming concepts: macros. In the previous step,
1236 quoting enabled some simple manipulation data structures and therefore
1237 manipulation of mal code (because the `eval` function from step
1238 6 turns mal data into code). In this step you will be able to mark mal
1239 functions as macros which can manipulate mal code before it is
1240 evaluated. In other words, macros are user-defined special forms. Or
1241 to look at it another way, macros allow mal programs to redefine
1242 the mal language itself.
1243
1244 Compare the pseudocode for step 7 and step 8 to get a basic idea of
1245 the changes that will be made during this step:
1246 ```
1247 diff -urp ../process/step7_quote.txt ../process/step8_macros.txt
1248 ```
1249
1250 * Copy `step7_quote.qx` to `step8_macros.qx`.
1251
1252
1253 You might think that the infinite power of macros would require some
1254 sort of complex mechanism, but the implementation is actually fairly
1255 simple.
1256
1257 * Add a new attribute `is_macro` to mal function types. This should
1258   default to false.
1259
1260 * Add a new special form `defmacro!`. This is very similar to the
1261   `def!` form, but before the evaluated value (mal function) is set in
1262   the environment, the `is_macro` attribute should be set to true.
1263
1264 * Add a `is_macro_call` function: This function takes arguments `ast`
1265   and `env`. It returns true if `ast` is a list that contains a symbol
1266   as the first element and that symbol refers to a function in the
1267   `env` environment and that function has the `is_macro` attribute set
1268   to true. Otherwise, it returns false.
1269
1270 * Add a `macroexpand` function: This function takes arguments `ast`
1271   and `env`. It calls `is_macro_call` with `ast` and `env` and loops
1272   while that condition is true. Inside the loop, the first element of
1273   the `ast` list (a symbol), is looked up in the environment to get
1274   the macro function. This macro function is then called/applied with
1275   the rest of the `ast` elements (2nd through the last) as arguments.
1276   The return value of the macro call becomes the new value of `ast`.
1277   When the loop completes because `ast` no longer represents a macro
1278   call, the current value of `ast` is returned.
1279
1280 * In the evaluator (`EVAL`) before the special forms switch (apply
1281   section), perform macro expansion by calling the `macroexpand`
1282   function with the current value of `ast` and `env`. Set `ast` to the
1283   result of that call. If the new value of `ast` is no longer a list
1284   after macro expansion, then return the result of calling `eval_ast`
1285   on it, otherwise continue with the rest of the apply section
1286   (special forms switch).
1287
1288 * Add a new special form condition for `macroexpand`. Call the
1289   `macroexpand` function using the first `ast` argument (second list
1290   element) and `env`. Return the result. This special form allows
1291   a mal program to do explicit macro expansion without applying the
1292   result (which can be useful for debugging macro expansion).
1293
1294 Now go to the top level, run the step 8 tests:
1295 ```
1296 make "test^quux^step8"
1297 ```
1298
1299 There is a reasonably good chance that the macro tests will not pass
1300 the first time. Although the implementation of macros is fairly
1301 simple, debugging runtime bugs with macros can be fairly tricky. If
1302 you do run into subtle problems that are difficult to solve, let me
1303 recommend a couple of approaches:
1304
1305 * Use the macroexpand special form to eliminate one of the layers of
1306   indirection (to expand but skip evaluate). This will often reveal
1307   the source of the issue.
1308 * Add a debug print statement to the top of your main `eval` function
1309   (inside the TCO loop) to print the current value of `ast` (hint use
1310   `pr_str` to get easier to debug output). Pull up the step8
1311   implementation from another language and uncomment its `eval`
1312   function (yes, I give you permission to violate the rule this once).
1313   Run the two side-by-side. The first difference is likely to point to
1314   the bug.
1315
1316 Congratulations! You now have a Lisp interpreter with a super power
1317 that most non-Lisp languages can only dream of (I have it on good
1318 authority that languages dream when you are not using them). If you
1319 are not already familiar with Lisp macros, I suggest the following
1320 exercise: write a recursive macro that handles postfixed mal code
1321 (with the function as the last parameter instead of the first). Or
1322 not. I have not actually done so myself, but I have heard it is an
1323 interesting exercise.
1324
1325 In the next step you will add try/catch style exception handling to
1326 your implementation in addition to some new core functions. After
1327 step9 you will be very close to having a fully self-hosting mal
1328 implementation. Let us continue!
1329
1330
1331 #### Deferrable
1332
1333 * Add the following new core functions which are frequently used in
1334   macro functions:
1335   * `nth`: this function takes a list (or vector) and a number (index)
1336     as arguments, returns the element of the list at the given index.
1337     If the index is out of range, this function raises an exception.
1338   * `first`: this function takes a list (or vector) as its argument
1339     and return the first element. If the list (or vector) is empty or
1340     is `nil` then `nil` is returned.
1341   * `rest`: this function takes a list (or vector) as its argument and
1342     returns a new list containing all the elements except the first. If
1343     the list (or vector) is empty or is `nil` then `()` (empty list)
1344     is returned.
1345
1346 * In the main program, call the `rep` function with the following
1347   string argument  to define a new control structure.
1348 ```
1349 "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))"
1350 ```
1351     * Note that `cond` calls the `throw` function when `cond` is
1352       called with an odd number of args. The `throw` function is
1353       implemented in the next step, but it will still serve it's
1354       purpose here by causing an undefined symbol error.
1355
1356
1357 <a name="step9"></a>
1358
1359 ### Step 9: Try
1360
1361 ![step9_try architecture](step9_try.png)
1362
1363 In this step you will implement the final mal special form for
1364 error/exception handling: `try*/catch*`. You will also add several core
1365 functions to your implementation. In particular, you will enhance the
1366 functional programming pedigree of your implementation by adding the
1367 `apply` and `map` core functions.
1368
1369 Compare the pseudocode for step 8 and step 9 to get a basic idea of
1370 the changes that will be made during this step:
1371 ```
1372 diff -urp ../process/step8_macros.txt ../process/step9_try.txt
1373 ```
1374
1375 * Copy `step8_macros.qx` to `step9_try.qx`.
1376
1377 * Add the `try*/catch*` special form to the EVAL function. The
1378   try catch form looks like this: `(try* A (catch* B C))`. The form
1379   `A` is evaluated, if it throws an exception, then form `C` is
1380   evaluated with a new environment that binds the symbol `B` to the
1381   value of the exception that was thrown.
1382   * If your target language has built-in try/catch style exception
1383     handling then you are already 90% of the way done. Add a
1384     (native language) try/catch block that evaluates `A` within
1385     the try block and catches all exceptions. If an exception is
1386     caught, then translate it to a mal type/value. For native
1387     exceptions this is either the message string or a mal hash-map
1388     that contains the message string and other attributes of the
1389     exception. When a regular mal type/value is used as an
1390     exception, you will probably need to store it within a native
1391     exception type in order to be able to convey/transport it using
1392     the native try/catch mechanism. Then you will extract the mal
1393     type/value from the native exception. Create a new mal environment
1394     that binds `B` to the value of the exception. Finally, evaluate `C`
1395     using that new environment.
1396   * If your target language does not have built-in try/catch style
1397     exception handling then you have some extra work to do. One of the
1398     most straightforward approaches is to create a a global error
1399     variable that stores the thrown mal type/value. The complication
1400     is that there are a bunch of places where you must check to see if
1401     the global error state is set and return without proceeding. The
1402     rule of thumb is that this check should happen at the top of your
1403     EVAL function and also right after any call to EVAL (and after any
1404     function call that might happen to call EVAL further down the
1405     chain). Yes, it is ugly, but you were warned in the section on
1406     picking a language.
1407
1408 * Add the `throw` core function.
1409   * If your language supports try/catch style exception handling, then
1410     this function takes a mal type/value and throws/raises it as an
1411     exception. In order to do this, you may need to create a custom
1412     exception object that wraps a mal value/type.
1413   * If your language does not support try/catch style exception
1414     handling, then set the global error state to the mal type/value.
1415
1416 * Add the `apply` and `map` core functions. In step 5, if you did not
1417   add the original function (`fn`) to the structure returned from
1418   `fn*`, the you will need to do so now.
1419   * `apply`: takes at least two arguments. The first argument is
1420     a function and the last argument is list (or vector). The
1421     arguments between the function and the last argument (if there are
1422     any) are concatenated with the final argument to create the
1423     arguments that are used to call the function. The apply
1424     function allows a function to be called with arguments that are
1425     contained in a list (or vector). In other words, `(apply F A B [C
1426     D])` is equivalent to `(F A B C D)`.
1427   * `map`: takes a function and a list (or vector) and evaluates the
1428     function against every element of the list (or vector) one at
1429     a time and returns the results as a list.
1430
1431 * Add some type predicates core functions. In Lisp, predicates are
1432   functions that return true/false (or true value/nil) and typically
1433   end in "?" or "p".
1434   * `nil?`: takes a single argument and returns true (mal true value)
1435     if the argument is nil (mal nil value).
1436   * `true?`: takes a single argument and returns true (mal true value)
1437     if the argument is a true value (mal true value).
1438   * `false?`: takes a single argument and returns true (mal true
1439     value) if the argument is a false value (mal false value).
1440   * `symbol?`: takes a single argument and returns true (mal true
1441     value) if the argument is a symbol (mal symbol value).
1442
1443 Now go to the top level, run the step 9 tests:
1444 ```
1445 make "test^quux^step9"
1446 ```
1447
1448 Your mal implementation is now essentially a fully featured Lisp
1449 interpreter. But if you stop now you will miss one of the most
1450 satisfying and enlightening aspects of creating a mal implementation:
1451 self-hosting.
1452
1453 #### Deferrable
1454
1455 * Add the following new core functions:
1456   * `symbol`: takes a string and returns a new symbol with the string
1457     as its name.
1458   * `keyword`: takes a string and returns a keyword with the same name
1459     (usually just be prepending the special keyword
1460     unicode symbol). This function should also detect if the argument
1461     is already a keyword and just return it.
1462   * `keyword?`: takes a single argument and returns true (mal true
1463     value) if the argument is a keyword, otherwise returns false (mal
1464     false value).
1465   * `vector`: takes a variable number of arguments and returns
1466     a vector containing those arguments.
1467   * `vector?`: takes a single argument and returns true (mal true
1468     value) if the argument is a vector, otherwise returns false (mal
1469     false value).
1470   * `sequential?`: takes a single argument and returns true (mal true
1471     value) if it is a list or a vector, otherwise returns false (mal
1472     false value).
1473   * `hash-map`: takes a variable but even number of arguments and
1474     returns a new mal hash-map value with keys from the odd arguments
1475     and values from the even arguments respectively. This is basically
1476     the functional form of the `{}` reader literal syntax.
1477   * `map?`: takes a single argument and returns true (mal true
1478     value) if the argument is a hash-map, otherwise returns false (mal
1479     false value).
1480   * `assoc`: takes a hash-map as the first argument and the remaining
1481     arguments are odd/even key/value pairs to "associate" (merge) into
1482     the hash-map. Note that the original hash-map is unchanged
1483     (remember, mal values are immutable), and a new hash-map
1484     containing the old hash-maps key/values plus the merged key/value
1485     arguments is returned.
1486   * `dissoc`: takes a hash-map and a list of keys to remove from the
1487     hash-map. Again, note that the original hash-map is unchanged and
1488     a new hash-map with the keys removed is returned. Key arguments
1489     that do not exist in the hash-map are ignored.
1490   * `get`: takes a hash-map and a key and returns the value of looking
1491     up that key in the hash-map. If the key is not found in the
1492     hash-map then nil is returned.
1493   * `contains?`: takes a hash-map and a key and returns true (mal true
1494     value) if the key exists in the hash-map and false (mal false
1495     value) otherwise.
1496   * `keys`: takes a hash-map and returns a list (mal list value) of
1497     all the keys in the hash-map.
1498   * `vals`: takes a hash-map and returns a list (mal list value) of
1499     all the values in the hash-map.
1500
1501
1502 <a name="stepA"></a>
1503
1504 ### Step A: Metadata, Self-hosting and Interop
1505
1506 ![stepA_mal architecture](stepA_mal.png)
1507
1508 You have reached the final step of your mal implementation. This step
1509 is kind of a catchall for things that did not fit into other steps.
1510 But most importantly, the changes you make in this step will unlock
1511 the magical power known as "self-hosting". You might have noticed
1512 that one of the languages that mal is implemented in is "mal". Any mal
1513 implementation that is complete enough can run the mal implementation
1514 of mal. You might need to pull out your hammock and ponder this for
1515 a while if you have never built a compiler or interpreter before. Look
1516 at the step source files for the mal implementation of mal (it is not
1517 cheating now that you have reached step A).
1518
1519 If you deferred the implementation of keywords, vectors and hash-maps,
1520 now is the time to go back and implement them if you want your
1521 implementation to self-host.
1522
1523 Compare the pseudocode for step 9 and step A to get a basic idea of
1524 the changes that will be made during this step:
1525 ```
1526 diff -urp ../process/step9_try.txt ../process/stepA_mal.txt
1527 ```
1528
1529 * Copy `step9_try.qx` to `stepA_mal.qx`.
1530
1531 * Add the `readline` core function. This functions takes a
1532   string that is used to prompt the user for input. The line of text
1533   entered by the user is returned as a string. If the user sends an
1534   end-of-file (usually Ctrl-D), then nil is returned.
1535
1536 * Add a new "\*host-language\*" (symbol) entry to your REPL
1537   environment. The value of this entry should be a mal string
1538   containing the name of the current implementation.
1539
1540 * When the REPL starts up (as opposed to when it is called with
1541   a script and/or arguments), call the `rep` function with this string
1542   to print a startup header:
1543   "(println (str \"Mal [\" \*host-language\* \"]\"))".
1544
1545 * Ensure that the REPL environment contains definitions for `time-ms`,
1546   `meta`, `with-meta`, `fn?`
1547   `string?`, `number?`, `seq`, and `conj`.  It doesn't really matter
1548   what they do at this stage: they just need to be defined.  Making
1549   them functions that raise a "not implemented" exception would be
1550   fine.
1551
1552 Now go to the top level, run the step A tests:
1553 ```
1554 make "test^quux^stepA"
1555 ```
1556
1557 Once you have passed all the non-optional step A tests, it is time to
1558 try self-hosting. Run your step A implementation as normal, but use
1559 the file argument mode you added in step 6 to run a each of the step
1560 from the mal implementation:
1561 ```
1562 ./stepA_mal.qx ../mal/step1_read_print.mal
1563 ./stepA_mal.qx ../mal/step2_eval.mal
1564 ...
1565 ./stepA_mal.qx ../mal/step9_try.mal
1566 ./stepA_mal.qx ../mal/stepA_mal.mal
1567 ```
1568
1569 There is a very good chance that you will encounter an error at some
1570 point while trying to run the mal in mal implementation steps above.
1571 Debugging failures that happen while self-hosting is MUCH more
1572 difficult and mind bending. One of the best approaches I have
1573 personally found is to add prn statements to the mal implementation
1574 step (not your own implementation of mal) that is causing problems.
1575
1576 Another approach I have frequently used is to pull out the code from
1577 the mal implementation that is causing the problem and simplify it
1578 step by step until you have a simple piece of mal code that still
1579 reproduces the problem. Once the reproducer is simple enough you will
1580 probably know where in your own implementation that problem is likely
1581 to be. Please add your simple reproducer as a test case so that future
1582 implementers will fix similar issues in their code before they get to
1583 self-hosting when it is much more difficult to track down and fix.
1584
1585 Once you can manually run all the self-hosted steps, it is time to run
1586 all the tests in self-hosted mode:
1587 ```
1588 make MAL_IMPL=quux "test^mal"
1589 ```
1590
1591 When you run into problems (which you almost certainly will), use the
1592 same process described above to debug them.
1593
1594 Congratulations!!! When all the tests pass, you should pause for
1595 a moment and consider what you have accomplished. You have implemented
1596 a Lisp interpreter that is powerful and complete enough to run a large
1597 mal program which is itself an implementation of the mal language. You
1598 might even be asking if you can continue the "inception" by using your
1599 implementation to run a mal implementation which itself runs the mal
1600 implementation.
1601
1602
1603 #### Optional additions
1604
1605 * Add meta-data support to composite data types (lists, vectors
1606   and hash-maps), and to functions (native or not), by adding a new
1607   metadata attribute that refers to another mal value/type
1608   (nil by default). Add the following metadata related core functions
1609   (and remove any stub versions):
1610   * `meta`: this takes a single mal function argument and returns the
1611     value of the metadata attribute.
1612   * `with-meta`: this function takes two arguments. The first argument
1613     is a mal function and the second argument is another mal
1614     value/type to set as metadata. A copy of the mal function is
1615     returned that has its `meta` attribute set to the second argument.
1616     Note that it is important that the environment and macro attribute
1617     of mal function are retained when it is copied.
1618   * Add a reader-macro that expands the token "^" to
1619     return a new list that contains the symbol "with-meta" and the
1620     result of reading the next next form (2nd argument) (`read_form`) and the
1621     next form (1st argument) in that order
1622     (metadata comes first with the ^ macro and the function second).
1623   * If you implemented as `defmacro!` to mutate an existing function
1624     without copying it, you can now use the function copying mechanism
1625     used for metadata to make functions immutable even in the
1626     defmacro! case...
1627
1628 * Add the following new core functions (and remove any stub versions):
1629   * `time-ms`: takes no arguments and returns the number of
1630     milliseconds since epoch (00:00:00 UTC January 1, 1970), or, if
1631     not possible, since another point in time (`time-ms` is usually
1632     used relatively to measure time durations).  After `time-ms` is
1633     implemented, you can run the performance micro-benchmarks by
1634     running `make perf^quux`.
1635   * `conj`: takes a collection and one or more elements as arguments
1636     and returns a new collection which includes the original
1637     collection and the new elements.  If the collection is a list, a
1638     new list is returned with the elements inserted at the start of
1639     the given list in opposite order; if the collection is a vector, a
1640     new vector is returned with the elements added to the end of the
1641     given vector.
1642   * `string?`: returns true if the parameter is a string.
1643   * `number?`: returns true if the parameter is a number.
1644   * `fn?`: returns true if the parameter is a function (internal or
1645     user-defined).
1646   * `macro?`: returns true if the parameter is a macro.
1647   * `seq`: takes a list, vector, string, or nil. If an empty list,
1648     empty vector, or empty string ("") is passed in then nil is
1649     returned. Otherwise, a list is returned unchanged, a vector is
1650     converted into a list, and a string is converted to a list that
1651     containing the original string split into single character
1652     strings.
1653 * For interop with the target language, add this core function:
1654   * `quux-eval`: takes a string, evaluates it in the target language,
1655     and returns the result converted to the relevant Mal type. You
1656     may also add other interop functions as you see fit; Clojure, for
1657     example, has a function called `.` which allows calling Java
1658     methods. If the target language is a static language, consider
1659     using FFI or some language-specific reflection mechanism, if
1660     available. The tests for `quux-eval` and any other interop
1661     function should be added in `quux/tests/stepA_mal.mal` (see the
1662     [tests for `lua-eval`](../lua/tests/stepA_mal.mal) as an example).
1663
1664 ### Next Steps
1665
1666 * Join the #mal IRC channel. It's fairly quiet but there are bursts of
1667   interesting conversation related to mal, Lisps, esoteric programming
1668   languages, etc.
1669 * If you have created an implementation for a new target language (or
1670   a unique and interesting variant of an existing implementation),
1671   consider sending a pull request to add it into the main mal
1672   repository. The [FAQ](../docs/FAQ.md#will-you-add-my-new-implementation)
1673   describes general requirements for getting an implementation merged
1674   into the main repository.
1675 * Take your interpreter implementation and have it emit source code in
1676   the target language rather than immediately evaluating it. In other
1677   words, create a compiler.
1678 * Pick a new target language and implement mal in it. Pick a language
1679   that is very different from any that you already know.
1680 * Use your mal implementation to implement a real world project. Many
1681   of these will force you to address interop. Some ideas:
1682   * Web server (with mal as CGI language for extra points)
1683   * An IRC/Slack chat bot
1684   * An editor (GUI or curses) with mal as a scripting/extension
1685     language.
1686   * An AI player for a game like Chess or Go.
1687 * Implement a feature in your mal implementation that is not covered
1688   by this guide. Some ideas:
1689   * Namespaces
1690   * Multi-threading support
1691   * Errors with line numbers and/or stack traces.
1692   * Lazy sequences
1693   * Clojure-style protocols
1694   * Full call/cc (call-with-current-continuation) support
1695   * Explicit TCO (i.e. `recur`) with tail-position error checking