process/guide.md

   1 # The Make-A-Lisp Process
   2
   3 So you want to write a Lisp interpreter? Welcome!
   4
   5 The goal of the Make-A-Lisp project is to make it easy to write your
   6 own Lisp interpreter without sacrificing those many "Aha!" moments
   7 that come from ascending the McCarthy mountain. When you reach the peak
   8 of this particular mountain, you will have an interpreter for the mal
   9 Lisp language that is powerful enough to be self-hosting, meaning it
  10 will be able to run a mal interpreter written in mal itself.
  11
  12 So jump right in (er ... start the climb)!
  13
  14 - [Pick a language](#pick-a-language)
  15 - [Getting started](#getting-started)
  16 - [General hints](#general-hints)
  17 - [The Make-A-Lisp Process](#the-make-a-lisp-process-1)
  18   - [Step 0: The REPL](#step-0-the-repl)
  19   - [Step 1: Read and Print](#step-1-read-and-print)
  20   - [Step 2: Eval](#step-2-eval)
  21   - [Step 3: Environments](#step-3-environments)
  22   - [Step 4: If Fn Do](#step-4-if-fn-do)
  23   - [Step 5: Tail call optimization](#step-5-tail-call-optimization)
  24   - [Step 6: Files, Mutation, and Evil](#step-6-files-mutation-and-evil)
  25   - [Step 7: Quoting](#step-7-quoting)
  26   - [Step 8: Macros](#step-8-macros)
  27   - [Step 9: Try](#step-9-try)
  28   - [Step A: Metadata, Self-hosting and Interop](#step-a-metadata-self-hosting-and-interop)
  29
  30
  31 ## Pick a language
  32
  33 You might already have a language in mind that you want to use.
  34 Technically speaking, mal can be implemented in any sufficiently
  35 complete programming language (i.e. Turing complete), however, there are a few
  36 language features that can make the task MUCH easier. Here are some of
  37 them in rough order of importance:
  38
  39 * A sequential compound data structure (e.g. arrays, lists,
  40   vectors, etc)
  41 * An associative compound data structure (e.g. a dictionary,
  42   hash-map, associative array, etc)
  43 * Function references (first class functions, function pointers,
  44   etc)
  45 * Real exception handling (try/catch, raise, throw, etc)
  46 * Variable argument functions (variadic, var args, splats, apply, etc)
  47 * Function closures
  48 * PCRE regular expressions
  49
  50 In addition, the following will make your task especially easy:
  51
  52 * Dynamic typing / boxed types (specifically, the ability to store
  53   different data types in the sequential and associative structures
  54   and the language keeps track of the type for you)
  55 * Compound data types support arbitrary runtime "hidden" data
  56   (metadata, metatables, dynamic fields attributes)
  57
  58 Here are some examples of languages that have all of the above
  59 features: JavaScript, Ruby, Python, Lua, R, Clojure.
  60
  61 Michael Fogus has some great blog posts on interesting but less well
  62 known languages and many of the languages on his lists do not yet have
  63 any mal implementations:
  64 * http://blog.fogus.me/2011/08/14/perlis-languages/
  65 * http://blog.fogus.me/2011/10/18/programming-language-development-the-past-5-years/
  66
  67 Many of the most popular languages already have Mal implementations.
  68 However, this should not discourage you from creating your own
  69 implementation in a language that already has one. However, if you go
  70 this route, I suggest you avoid referring to the existing
  71 implementations (i.e. "cheating") to maximize your learning experience
  72 instead of just borrowing mine. On the other hand, if your goal is to
  73 add new implementations to mal as efficiently as possible, then you
  74 SHOULD find the most similar target language implementation and refer
  75 to it frequently.
  76
  77 If you want a fairly long list of programming languages with an
  78 approximate measure of popularity, try the [Programming Language
  79 Popularity Chart](http://langpop.corger.nl/)
  80
  81
  82 ## Getting started
  83
  84 * Install your chosen language interpreter/compiler, language package
  85   manager and build tools (if applicable)
  86
  87 * Fork the mal repository on github and then clone your forked
  88   repository:
  89 ```
  90 git clone git@github.com:YOUR_NAME/mal.git
  91 cd mal
  92 ```
  93
  94 * Make a new directory for your implementation. For example, if your
  95 language is called "quux":
  96 ```
  97 mkdir quux
  98 ```
  99
 100 * Modify the top level Makefile to allow the tests to be run against
 101   your implementation. For example, if your language is named "quux"
 102   and uses "qx" as the file extension, then make the following
 103   3 modifications to Makefile:
 104 ```
 105 IMPLS = ... quux ...
 106 ...
 107 quux_STEP_TO_PROG = mylang/$($(1)).qx
 108 ...
 109 quux_RUNSTEP =  ../$(2) $(3)
 110 ```
 111
 112 This allows you to run tests against your implementation like this:
 113 ```
 114 make "test^quux^stepX"
 115 ```
 116
 117 TODO: If your implementation language is a compiled language, then you
 118 should also add a Makefile at the top level of your implementation
 119 directory.
 120
 121 Your Makefile will define how to build the files pointed to by the
 122 quux_STEP_TO_PROG macro. The top-level Makefile will attempt to build
 123 those targets before running tests. If it is a scripting
 124 language/uncompiled, then no Makefile is necessary because
 125 quux_STEP_TO_PROG will point to a source file that already exists and
 126 does not need to be compiled/built.
 127
 128
 129 ## General hints
 130
 131 Stackoverflow and Google are your best friends. Modern polyglot
 132 developers do not memorize dozens of programming languages. Instead,
 133 they learn the peculiar terminology used with each language and then
 134 use this to search for their answers.
 135
 136 Here are some other resources where multiple languages are
 137 compared/described:
 138 * http://learnxinyminutes.com/
 139 * http://hyperpolyglot.org/
 140 * http://rosettacode.org/
 141 * http://rigaux.org/language-study/syntax-across-languages/
 142
 143 Do not let yourself be bogged down by specific problems. While the
 144 make-a-lisp process is structured as a series of steps, the reality is
 145 that building a lisp interpreter is more like a branching tree. If you
 146 get stuck on tail call optimization, or hash-maps, move on to other
 147 things. You will often have a stroke of inspiration for a problem as
 148 you work through other functionality. I have tried to structure this
 149 guide and the tests to make clear which things can be deferred until
 150 later.
 151
 152 An aside on deferrable/optional bits: when you run the tests for
 153 a given step, the last tests are often marked with an "optional"
 154 header. This indicates that these are tests for functionality that is
 155 not critical to finish a basic mal implementation. Many of the steps
 156 in this process guide have a "Deferrable" section, however, it is not
 157 quite the same meaning. Those sections include the functionality that
 158 is marked as optional in the tests, but they also include
 159 functionality that becomes mandatory at a later step. In other words,
 160 this is a "make your own Lisp adventure".
 161
 162 Use test driven development. Each step of the make-a-lisp process has
 163 a bunch of tests associated with it and there is an easy script to run
 164 all the tests for a specific step in the process. Pick a failing test,
 165 fix it, repeat until all the tests for that step pass.
 166
 167 ## Reference Code
 168
 169 The `process` directory contains abbreviated pseudocode and
 170 architecture images for each step of the make-a-lisp process. Use
 171 a textual diff/comparison tool to compare the previous pseudocode step
 172 with the one you are working on. The architecture images have changes
 173 from the previous step highlighted in red.
 174
 175 If you get completely stuck and are feeling like giving up, then you
 176 should "cheat" by referring to the same step or functionality in
 177 a existing implementation language. You are here to learn, not to take
 178 a test, so do not feel bad about it. Okay, you should feel a little
 179 bit bad about it.
 180
 181
 182 ## The Make-A-Lisp Process
 183
 184 In the steps that follow the name of the target language is "quux" and
 185 the file extension for that language is "qx".
 186
 187
 188 <a name="step0"></a>
 189
 190 ### Step 0: The REPL
 191
 192 ![step0_repl architecture](step0_repl.png)
 193
 194 This step is basically just creating a skeleton of your interpreter.
 195
 196 * Create a `step0_repl.qx` file in `quux/`.
 197
 198 * Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep`
 199   (read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just
 200   stubs that return their first parameter (a string if your target
 201   language is a statically typed) and `rep` calls them in order
 202   passing the return to the input of the next.
 203
 204 * Add a main loop that repeatedly prints a prompt (needs to be
 205   "user> " for later tests to pass), gets a line of input from the
 206   user, calls `rep` with that line of input, and then prints out the
 207   result from `rep`. It should also exit when you send it an EOF
 208   (often Ctrl-D).
 209
 210 * If you are using a compiled (ahead-of-time rather than just-in-time)
 211   language, then create a Makefile (or appropriate project definition
 212   file) in your directory.
 213
 214 It is time to run your first tests. This will check that your program
 215 does input and output in a way that can be captured by the test
 216 harness. Go to the top level and run the following:
 217 ```
 218 make "test^quux^step0"
 219 ```
 220
 221 Add and then commit your new `step0_repl.qx` and `Makefile` to git.
 222
 223 Congratulations! You have just completed the first step of the
 224 make-a-lisp process.
 225
 226
 227 #### Optional:
 228
 229 * Add full line editing and command history support to your
 230   interpreter REPL. Many languages have a library/module that provide
 231   line editing support. Another option if your language supports it is
 232   to use an FFI (foreign function interface) to load and call directly
 233   into GNU readline, editline, or linenoise library. Add line
 234   editing interface code to `readline.qx`
 235
 236
 237 <a name="step1"></a>
 238
 239 ### Step 1: Read and Print
 240
 241 ![step1_read_print architecture](step1_read_print.png)
 242
 243 In this step, your interpreter will "read" the string from the user
 244 and parse it into an internal tree data structure (an abstract syntax
 245 tree) and then take that data structure and "print" it back to
 246 a string.
 247
 248 In non-lisp languages, this step (called "lexing and parsing") can be
 249 one of the most complicated parts of the compiler/interpreter. In
 250 Lisp, the data structure that you want in memory is basically
 251 represented directly in the code that the programmer writes
 252 (homoiconicity).
 253
 254 For example, if the string is "(+ 2 (* 3 4))" then the read function
 255 will process this into a tree structure that looks like this:
 256 ```
 257           List
 258          / |  \
 259         /  |   \
 260        /   |    \
 261   Sym:+  Int:2  List
 262                / |  \
 263               /  |   \
 264              /   |    \
 265          Sym:*  Int:3  Int:4
 266 ```
 267
 268 Each left paren and its matching right paren (lisp "sexpr") becomes
 269 a node in the tree and everything else becomes a leaf in the tree.
 270
 271 If you can find code for an implementation of a JSON encoder/decoder
 272 in your target language then you can probably just borrow and modify
 273 that and be 75% of the way done with this step.
 274
 275 The rest of this section is going to assume that you are not starting
 276 from an existing JSON encoder/decoder, but that you do have access to
 277 a Perl compatible regular expressions (PCRE) module/library. You can
 278 certainly implement the reader using simple string operations, but it
 279 is more involved. The `make`, `ps` (postscript) and Haskell
 280 implementations have examples of a reader/parser without using regular
 281 expression support.
 282
 283 * Copy `step0_repl.qx` to `step1_read_print.qx`.
 284
 285 * Add a `reader.qx` file to hold functions related to the reader.
 286
 287 * If the target language has objects types (OOP), then the next step
 288   is to create a simple stateful Reader object in `reader.qx`. This
 289   object will store the tokens and a position. The Reader object will
 290   have two methods: `next` and `peek`. `next` returns the token at
 291   the current position and increments the position. `peek` just
 292   returns the token at the current position.
 293
 294 * Add a function `read_str` in `reader.qx`. This function
 295   will call `tokenizer` and then create a new Reader object instance
 296   with the tokens. Then it will call `read_form` with the Reader
 297   instance.
 298
 299 * Add a function `tokenizer` in `reader.qx`. This function will take
 300   a single string and return an array/list
 301   of all the tokens (strings) in it. The following regular expression
 302   (PCRE) will match all mal tokens.
 303 ```
 304 [\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"|;.*|[^\s\[\]{}('"`,;)]*)
 305 ```
 306 * For each match captured within the parenthesis starting at char 6 of the
 307   regular expression a new token will be created.
 308
 309   * `[\s,]*`: Matches any number of whitespaces or commas. This is not captured
 310     so it will be ignored and not tokenized.
 311
 312   * `~@`: Captures the special two-characters `~@` (tokenized).
 313
 314   * ```[\[\]{}()'`~^@]```: Captures any special single character, one of
 315     ```[]{}'`~^@``` (tokenized).
 316
 317   * `"(?:\\.|[^\\"])*"`: Starts capturing at a double-quote and stops at the
 318     next double-quote unless it was proceeded by a backslash in which case it
 319     includes it until the next double-quote (tokenized).
 320
 321   * `;.*`: Captures any sequence of characters starting with `;` (tokenized).
 322
 323   * ```[^\s\[\]{}('"`,;)]*```: Captures a sequence of zero or more non special
 324     characters (e.g. symbols, numbers, "true", "false", and "nil") and is sort
 325     of the inverse of the one above that captures special characters (tokenized).
 326
 327 * Add the function `read_form` to `reader.qx`. This function
 328   will peek at the first token in the Reader object and switch on the
 329   first character of that token. If the character is a left paren then
 330   `read_list` is called with the Reader object. Otherwise, `read_atom`
 331   is called with the Reader Object. The return value from `read_form`
 332   is a mal data type. If your target language is statically typed then
 333   you will need some way for `read_form` to return a variant or
 334   subclass type. For example, if your language is object oriented,
 335   then you can define a top level MalType (in `types.qx`) that all
 336   your mal data types inherit from. The MalList type (which also
 337   inherits from MalType) will contains a list/array of other MalTypes.
 338   If your language is dynamically typed then you can likely just
 339   return a plain list/array of other mal types.
 340
 341 * Add the function `read_list` to `reader.qx`. This function will
 342   repeatedly call `read_form` with the Reader object until it
 343   encounters a ')' token (if it reach EOF before reading a ')' then
 344   that is an error). It accumulates the results into a List type.  If
 345   your language does not have a sequential data type that can hold mal
 346   type values you may need to implement one (in `types.qx`).  Note
 347   that `read_list` repeatedly calls `read_form` rather than
 348   `read_atom`. This mutually recursive definition between `read_list`
 349   and `read_form` is what allows lists to contain lists.
 350
 351 * Add the function `read_atom` to `reader.qx`. This function will
 352   look at the contents of the token and return the appropriate scalar
 353   (simple/single) data type value. Initially, you can just implement
 354   numbers (integers) and symbols . This will allow you to proceed
 355   through the next couple of steps before you will need to implement
 356   the other fundamental mal types: nil, true, false, and string. The
 357   remaining mal types: keyword, vector, hash-map, and atom do not
 358   need to be implemented until step 9 (but can be implemented at any
 359   point between this step and that). BTW, symbols types are just an
 360   object that contains a single string name value (some languages have
 361   symbol types already).
 362
 363 * Add a file `printer.qx`. This file will contain a single function
 364   `pr_str` which does the opposite of `read_str`: take a mal data
 365   structure and return a string representation of it. But `pr_str` is
 366   much simpler and is basically just a switch statement on the type of
 367   the input object:
 368
 369   * symbol: return the string name of the symbol
 370   * number: return the number as a string
 371   * list: iterate through each element of the list calling `pr_str` on
 372     it, then join the results with a space separator, and surround the
 373     final result with parens
 374
 375 * Change the `READ` function in `step1_read_print.qx` to call
 376   `reader.read_str` and the `PRINT` function to call `printer.pr_str`.
 377   `EVAL` continues to simply return its input but the type is now
 378   a mal data type.
 379
 380 You now have enough hooked up to begin testing your code. You can
 381 manually try some simple inputs:
 382   * `123` -> `123`
 383   * `   123  ` -> `123`
 384   * `abc` -> `abc`
 385   * `   abc   ` -> `abc`
 386   * `(123 456)` -> `(123 456)`
 387   * `(  123   456 789   )   ` -> `(123 456 789)`
 388   * `(  + 2   (*  3  4)  )  ` -> `(+ 2 (* 3 4))`
 389
 390 To verify that your code is doing more than just eliminating extra
 391 spaces (and not failing), you can instrument your `reader.qx` functions.
 392
 393 Once you have gotten past those simple manual tests, it is time to run
 394 the full suite of step 1 tests. Go to the top level and run the
 395 following:
 396 ```
 397 make "test^quux^step1"
 398 ```
 399
 400 Fix any test failures related to symbols, numbers and lists.
 401
 402 Depending on the functionality of your target language, it is likely
 403 that you have now just completed one of the most difficult steps. It
 404 is down hill from here. The remaining steps will probably be easier
 405 and each step will give progressively more bang for the buck.
 406
 407 #### Deferrable:
 408
 409
 410 * Add error checking to your reader functions to make sure parens
 411   are properly matched. Catch and print these errors in your main
 412   loop. If your language does not have try/catch style bubble up
 413   exception handling, then you will need to add explicit error
 414   handling to your code to catch and pass on errors without crashing.
 415
 416 * Add support for the other basic data type to your reader and printer
 417   functions: string, nil, true, and false. These become mandatory at
 418   step 4. When a string is read, the following transformations are
 419   applied: a backslash followed by a doublequote is translated into
 420   a plain doublequote character, a backslash followed by "n" is
 421   translated into a newline, and a backslash followed by another
 422   backslash is translated into a single backslash. To properly print
 423   a string (for step 4 string functions), the `pr_str` function needs
 424   another parameter called `print_readably`.  When `print_readably` is
 425   true, doublequotes, newlines, and backslashes are translated into
 426   their printed representations (the reverse of the reader). The
 427   `PRINT` function in the main program should call `pr_str` with
 428   print_readably set to true.
 429
 430 * Add support for the other mal types: keyword, vector, hash-map.
 431   * keyword: a keyword is a token that begins with a colon. A keyword
 432     can just be stored as a string with special unicode prefix like
 433     0x29E (or char 0xff/127 if the target language does not have good
 434     unicode support) and the printer translates strings with that
 435     prefix back to the keyword representation. This makes it easy to
 436     use keywords as hash map keys in most languages. You can also
 437     store keywords as a unique data type, but you will need to make
 438     sure they can be used as hash map keys (which may involve doing
 439     a similar prefixed translation anyways).
 440   * vector: a vector can be implemented with same underlying
 441     type as a list as long as there is some mechanism to keep track of
 442     the difference. You can use the same reader function for both
 443     lists and vectors by adding parameters for the starting and ending
 444     tokens.
 445   * hash-map: a hash-map is an associative data structure that maps
 446     strings to other mal values. If you implement keywords as prefixed
 447     strings, then you only need a native associative data structure
 448     which supports string keys. Clojure allows any value to be a hash
 449     map key, but the base functionality in mal is to support strings
 450     and keyword keys. Because of the representation of hash-maps as
 451     an alternating sequence of keys and values, you can probably use
 452     the same reader function for hash-maps as lists and vectors with
 453     parameters to indicate the starting and ending tokens. The odd
 454     tokens are then used for keys with the corresponding even tokens
 455     as the values.
 456
 457 * Add support for reader macros which are forms that are
 458   transformed into other forms during the read phase. Refer to
 459   `tests/step1_read_print.mal` for the form that these macros should
 460   take (they are just simple transformations of the token stream).
 461
 462 * Add comment support to your reader. The tokenizer should ignore
 463   tokens that start with ";". Your `read_str` function will need to
 464   properly handle when the tokenizer returns no values. The simplest
 465   way to do this is to return `nil` mal value. A cleaner option (that
 466   does not print `nil` at the prompt is to throw a special exception
 467   that causes the main loop to simply continue at the beginning of the
 468   loop without calling `rep`.
 469
 470
 471 <a name="step2"></a>
 472
 473 ### Step 2: Eval
 474
 475 ![step2_eval architecture](step2_eval.png)
 476
 477 In step 1 your mal interpreter was basically just a way to validate
 478 input and eliminate extraneous white space. In this step you will turn
 479 your interpreter into a simple number calculator by adding
 480 functionality to the evaluator (`EVAL`).
 481
 482 Compare the pseudocode for step 1 and step 2 to get a basic idea of
 483 the changes that will be made during this step:
 484 ```
 485 diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt
 486 ```
 487
 488 * Copy `step1_read_print.qx` to `step2_eval.qx`.
 489
 490 * Define a simple initial REPL environment. This environment is an
 491   associative structure that maps symbols (or symbol names) to
 492   numeric functions. For example, in python this would look something
 493   like this:
 494 ```
 495 repl_env = {'+': lambda a,b: a+b,
 496             '-': lambda a,b: a-b,
 497             '*': lambda a,b: a*b,
 498             '/': lambda a,b: int(a/b)}
 499 ```
 500
 501 * Modify the `rep` function to pass the REPL environment as the second
 502   parameter for the `EVAL` call.
 503
 504 * Create a new function `eval_ast` which takes `ast` (mal data type)
 505   and an associative structure (the environment from above).
 506   `eval_ast` switches on the type of `ast` as follows:
 507
 508   * symbol: lookup the symbol in the environment structure and return
 509     the value or raise an error no value is found
 510   * list: return a new list that is the result of calling `EVAL` on
 511     each of the members of the list
 512   * otherwise just return the original `ast` value
 513
 514 * Modify `EVAL` to check if the first parameter `ast` is a list.
 515   * `ast` is not a list: then return the result of calling `eval_ast`
 516     on it.
 517   * `ast` is a empty list: return ast unchanged.
 518   * `ast` is a list: call `eval_ast` to get a new evaluated list. Take
 519     the first item of the evaluated list and call it as function using
 520     the rest of the evaluated list as its arguments.
 521
 522 If your target language does not have full variable length argument
 523 support (e.g. variadic, vararg, splats, apply) then you will need to
 524 pass the full list of arguments as a single parameter and split apart
 525 the individual values inside of every mal function. This is annoying,
 526 but workable.
 527
 528 The process of taking a list and invoking or executing it to return
 529 something new is known in Lisp as the "apply" phase.
 530
 531 Try some simple expressions:
 532
 533   * `(+ 2 3)` -> `5`
 534   * `(+ 2 (* 3 4))` -> `14`
 535
 536 The most likely challenge you will encounter is how to properly call
 537 a function references using an arguments list.
 538
 539 Now go to the top level, run the step 2 tests and fix the errors.
 540 ```
 541 make "test^quux^step2"
 542 ```
 543
 544 You now have a simple prefix notation calculator!
 545
 546
 547 <a name="step3"></a>
 548
 549 ### Step 3: Environments
 550
 551 ![step3_env architecture](step3_env.png)
 552
 553 In step 2 you were already introduced to REPL environment (`repl_env`)
 554 where the basic numeric functions were stored and looked up. In this
 555 step you will add the ability to create new environments (`let*`) and
 556 modify existing environments (`def!`).
 557
 558 A Lisp environment is an associative data structure that maps symbols (the
 559 keys) to values. But Lisp environments have an additional important
 560 function: they can refer to another environment (the outer
 561 environment). During environment lookups, if the current environment
 562 does not have the symbol, the lookup continues in the outer
 563 environment, and continues this way until the symbol is either found,
 564 or the outer environment is `nil` (the outermost environment in the
 565 chain).
 566
 567 Compare the pseudocode for step 2 and step 3 to get a basic idea of
 568 the changes that will be made during this step:
 569 ```
 570 diff -urp ../process/step2_eval.txt ../process/step3_env.txt
 571 ```
 572
 573 * Copy `step2_eval.qx` to `step3_env.qx`.
 574
 575 * Create `env.qx` to hold the environment definition.
 576
 577 * Define an `Env` object that is instantiated with a single `outer`
 578   parameter and starts with an empty associative data structure
 579   property `data`.
 580
 581 * Define three methods for the Env object:
 582   * set: takes a symbol key and a mal value and adds to the `data`
 583     structure
 584   * find: takes a symbol key and if the current environment contains
 585     that key then return the environment. If no key is found and outer
 586     is not `nil` then call find (recurse) on the outer environment.
 587   * get: takes a symbol key and uses the `find` method to locate the
 588     environment with the key, then returns the matching value. If no
 589     key is found up the outer chain, then throws/raises a "not found"
 590     error.
 591
 592 * Update `step3_env.qx` to use the new `Env` type to create the
 593   repl_env (with a `nil` outer value) and use the `set` method to add
 594   the numeric functions.
 595
 596 * Modify `eval_ast` to call the `get` method on the `env` parameter.
 597
 598 * Modify the apply section of `EVAL` to switch on the first element of
 599   the list:
 600   * symbol "def!": call the set method of the current environment
 601     (second parameter of `EVAL` called `env`) using the unevaluated
 602     first parameter (second list element) as the symbol key and the
 603     evaluated second parameter as the value.
 604   * symbol "let\*": create a new environment using the current
 605     environment as the outer value and then use the first parameter as
 606     a list of new bindings in the "let\*" environment. Take the second
 607     element of the binding list, call `EVAL` using the new "let\*"
 608     environment as the evaluation environment, then call `set` on the
 609     "let\*" environment using the first binding list element as the key
 610     and the evaluated second element as the value. This is repeated
 611     for each odd/even pair in the binding list. Note in particular,
 612     the bindings earlier in the list can be referred to by later
 613     bindings. Finally, the second parameter (third element) of the
 614     original `let*` form is evaluated using the new "let\*" environment
 615     and the result is returned as the result of the `let*` (the new
 616     let environment is discarded upon completion).
 617   * otherwise: call `eval_ast` on the list and apply the first element
 618     to the rest as before.
 619
 620 `def!` and `let*` are Lisp "specials" (or "special atoms") which means
 621 that they are language level features and more specifically that the
 622 rest of the list elements (arguments) may be evaluated differently (or
 623 not at all) unlike the default apply case where all elements of the
 624 list are evaluated before the first element is invoked. Lists which
 625 contain a "special" as the first element are known as "special forms".
 626 The are special because the follow special evaluation rules.
 627
 628 Try some simple environment tests:
 629
 630   * `(def! a 6)` -> `6`
 631   * `a` -> `6`
 632   * `(def! b (+ a 2))` -> `8`
 633   * `(+ a b)` -> `14`
 634   * `(let* (c 2) c)` -> `2`
 635
 636 Now go to the top level, run the step 3 tests and fix the errors.
 637 ```
 638 make "test^quux^step3"
 639 ```
 640
 641 You mal implementation is still basically just a numeric calculator
 642 with save/restore capability. But you have set the foundation for step
 643 4 where it will begin to feel like a real programming language.
 644
 645
 646 An aside on mutation and typing:
 647
 648 The "!" suffix on symbols is used to indicate that this symbol refers
 649 to a function that mutates something else. In this case, the `def!`
 650 symbol indicates a special form that will mutate the current
 651 environment. Many (maybe even most) of runtime problems that are
 652 encountered in software engineering are a result of mutation. By
 653 clearly marking code where mutation may occur, you can more easily
 654 track down the likely cause of runtime problems when they do occur.
 655
 656 Another cause of runtime errors is type errors, where a value of one
 657 type is unexpectedly treated by the program as a different and
 658 incompatible type. Statically typed languages try to make the
 659 programmer solve all type problems before the program is allowed to
 660 run. Most Lisp variants tend to be dynamically typed (types of values
 661 are checked when they are actually used at runtime).
 662
 663 As an aside-aside: The great debate between static and dynamic typing
 664 can be understood by following the money. Advocates of strict static
 665 typing use words like "correctness" and "safety" and thus get
 666 government and academic funding. Advocates of dynamic typing use words
 667 like "agile" and "time-to-market" and thus get venture capital and
 668 commercial funding.
 669
 670
 671 <a name="step4"></a>
 672
 673 ### Step 4: If Fn Do
 674
 675 ![step4_if_fn_do architecture](step4_if_fn_do.png)
 676
 677 In step 3 you added environments and the special forms for
 678 manipulating environments. In this step you will add 3 new special
 679 forms (`if`, `fn*` and `do`) and add several more core functions to
 680 the default REPL environment. Our new architecture will look like
 681 this:
 682
 683 The `fn*` special form is how new user-defined functions are created.
 684 In some Lisps, this special form is named "lambda".
 685
 686 Compare the pseudocode for step 3 and step 4 to get a basic idea of
 687 the changes that will be made during this step:
 688 ```
 689 diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt
 690 ```
 691
 692 * Copy `step3_env.qx` to `step4_if_fn_do.qx`.
 693
 694 * If you have not implemented reader and printer support (and data
 695   types) for `nil`, `true` and `false`, you will need to do so for
 696   this step.
 697
 698 * Update the constructor/initializer for environments to take two new
 699   arguments: `binds` and `exprs`. Bind (`set`) each element (symbol)
 700   of the binds list to the respective element of the `exprs` list.
 701
 702 * Add support to `printer.qx` to print functions values. A string
 703   literal like "#<function>" is sufficient.
 704
 705 * Add the following special forms to `EVAL`:
 706
 707   * `do`: Evaluate all the elements of the list using `eval_ast`
 708     and return the final evaluated element.
 709   * `if`: Evaluate the first parameter (second element). If the result
 710     (condition) is anything other than `nil` or `false`, then evaluate
 711     the second parameter (third element of the list) and return the
 712     result.  Otherwise, evaluate the third parameter (fourth element)
 713     and return the result. If condition is false and there is no third
 714     parameter, then just return `nil`.
 715   * `fn*`: Return a new function closure. The body of that closure
 716     does the following:
 717     * Create a new environment using `env` (closed over from outer
 718       scope) as the `outer` parameter, the first parameter (second
 719       list element of `ast` from the outer scope) as the `binds`
 720       parameter, and the parameters to the closure as the `exprs`
 721       parameter.
 722     * Call `EVAL` on the second parameter (third list element of `ast`
 723       from outer scope), using the new environment. Use the result as
 724       the return value of the closure.
 725
 726 If your target language does not support closures, then you will need
 727 to implement `fn*` using some sort of structure or object that stores
 728 the values being closed over: the first and second elements of the
 729 `ast` list (function parameter list and function body) and the current
 730 environment `env`. In this case, your native functions will need to be
 731 wrapped in the same way. You will probably also need a method/function
 732 that invokes your function object/structure for the default case of
 733 the apply section of `EVAL`.
 734
 735 Try out the basic functionality you have implemented:
 736
 737   * `(fn* [a] a)` -> `#<function>`
 738   * `( (fn* [a] a) 7)` -> `7`
 739   * `( (fn* [a] (+ a 1)) 10)` -> `11`
 740   * `( (fn* [a b] (+ a b)) 2 3)` -> `5`
 741
 742 * Add a new file `core.qx` and define an associative data structure
 743   `ns` (namespace) that maps symbols to functions. Move the numeric
 744   function definitions into this structure.
 745
 746 * Modify `step4_if_fn_do.qx` to iterate through the `core.ns`
 747   structure and add (`set`) each symbol/function mapping to the
 748   REPL environment (`repl_env`).
 749
 750 * Add the following functions to `core.ns`:
 751   * `list`: take the parameters and return them as a list.
 752   * `list?`: return true if the first parameter is a list, false
 753     otherwise.
 754   * `empty?`: treat the first parameter as a list and return true if
 755     the list is empty and false if it contains any elements.
 756   * `count`: treat the first parameter as a list and return the number
 757     of elements that it contains.
 758   * `=`: compare the first two parameters and return true if they are
 759     the same type and contain the same value. In the case of equal
 760     length lists, each element of the list should be compared for
 761     equality and if they are the same return true, otherwise false.
 762   * `<`, `<=`, `>`, and `>=`: treat the first two parameters as
 763     numbers and do the corresponding numeric comparison, returning
 764     either true or false.
 765
 766 Now go to the top level, run the step 4 tests. There are a lot of
 767 tests in step 4 but all of the non-optional tests that do not involve
 768 strings should be able to pass now.
 769
 770 ```
 771 make "test^quux^step4"
 772 ```
 773
 774 Your mal implementation is already beginning to look like a real
 775 language. You have flow control, conditionals, user-defined functions
 776 with lexical scope, side-effects (if you implement the string
 777 functions), etc. However, our little interpreter has not quite reached
 778 Lisp-ness yet. The next several steps will take your implementation
 779 from a neat toy to a full featured language.
 780
 781 #### Deferrable:
 782
 783 * Implement Clojure-style variadic function parameters. Modify the
 784   constructor/initializer for environments, so that if a "&" symbol is
 785   encountered in the `binds` list, the next symbol in the `binds` list
 786   after the "&" is bound to the rest of the `exprs` list that has not
 787   been bound yet.
 788
 789 * Define a `not` function using mal itself. In `step4_if_fn_do.qx`
 790   call the `rep` function with this string:
 791   "(def! not (fn* (a) (if a false true)))".
 792
 793 * Implement the strings functions in `core.qx`. To implement these
 794   functions, you will need to implement the string support in the
 795   reader and printer (deferrable section of step 1). Each of the string
 796   functions takes multiple mal values, prints them (`pr_str`) and
 797   joins them together into a new string.
 798   * `pr-str`: calls `pr_str` on each argument with `print_readably`
 799     set to true, joins the results with " " and returns the new
 800     string.
 801   * `str`: calls `pr_str` on each argument with `print_readably` set
 802     to false, concatenates the results together ("" separator), and
 803     returns the new string.
 804   * `prn`:  calls `pr_str` on each argument with `print_readably` set
 805     to true, joins the results with " ", prints the string to the
 806     screen and then returns `nil`.
 807   * `println`:  calls `pr_str` on each argument with `print_readably` set
 808     to false, joins the results with " ", prints the string to the
 809     screen and then returns `nil`.
 810
 811
 812 <a name="step5"></a>
 813
 814 ### Step 5: Tail call optimization
 815
 816 ![step5_tco architecture](step5_tco.png)
 817
 818 In step 4 you added special forms `do`, `if` and `fn*` and you defined
 819 some core functions. In this step you will add a Lisp feature called
 820 tail call optimization (TCO). Also called "tail recursion" or
 821 sometimes just "tail calls".
 822
 823 Several of the special forms that you have defined in `EVAL` end up
 824 calling back into `EVAL`. For those forms that call `EVAL` as the last
 825 thing that they do before returning (tail call) you will just loop back
 826 to the beginning of eval rather than calling it again. The advantage
 827 of this approach is that it avoids adding more frames to the call
 828 stack. This is especially important in Lisp languages because they tend
 829 to prefer using recursion instead of iteration for control structures.
 830 (Though some Lisps, such as Common Lisp, have iteration.) However, with
 831 tail call optimization, recursion can be made as stack efficient as
 832 iteration.
 833
 834 Compare the pseudocode for step 4 and step 5 to get a basic idea of
 835 the changes that will be made during this step:
 836 ```
 837 diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt
 838 ```
 839
 840 * Copy `step4_if_fn_do.qx` to `step5_tco.qx`.
 841
 842 * Add a loop (e.g. while true) around all code in `EVAL`.
 843
 844 * Modify each of the following form cases to add tail call recursion
 845   support:
 846   * `let*`: remove the final `EVAL` call on the second `ast` argument
 847     (third list element). Set `env` (i.e. the local variable passed in
 848     as second parameter of `EVAL`) to the new let environment. Set
 849     `ast` (i.e. the local variable passed in as first parameter of
 850     `EVAL`) to be the second `ast` argument. Continue at the beginning
 851     of the loop (no return).
 852   * `do`: change the `eval_ast` call to evaluate all the parameters
 853     except for the last (2nd list element up to but not including
 854     last). Set `ast` to the last element of `ast`. Continue
 855     at the beginning of the loop (`env` stays unchanged).
 856   * `if`: the condition continues to be evaluated, however, rather
 857     than evaluating the true or false branch, `ast` is set to the
 858     unevaluated value of the chosen branch. Continue at the beginning
 859     of the loop (`env` is unchanged).
 860
 861 * The return value from the `fn*` special form will now become an
 862   object/structure with attributes that allow the default invoke case
 863   of `EVAL` to do TCO on mal functions. Those attributes are:
 864   * `ast`: the second `ast` argument (third list element) representing
 865     the body of the function.
 866   * `params`: the first `ast` argument (second list element)
 867     representing the parameter names of the function.
 868   * `env`: the current value of the `env` parameter of `EVAL`.
 869   * `fn`: the original function value (i.e. what was return by `fn*`
 870     in step 4). Note that this is deferrable until step 9 when it is
 871     needed for the `map` and `apply` core functions).
 872
 873 * The default "apply"/invoke case of `EVAL` must now be changed to
 874   account for the new object/structure returned by the `fn*` form.
 875   Continue to call `eval_ast` on `ast`. The first element is `f`.
 876   Switch on the type of `f`:
 877   * regular function (not one defined by `fn*`): apply/invoke it as
 878     before (in step 4).
 879   * a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate
 880     a new environment using the `env` and `params` attributes of `f`
 881     as the `outer` and `binds` arguments and rest `ast` arguments
 882     (list elements 2 through the end) as the `exprs` argument. Set
 883     `env` to the new environment. Continue at the beginning of the loop.
 884
 885 Run some manual tests from previous steps to make sure you have not
 886 broken anything by adding TCO.
 887
 888 Now go to the top level, run the step 5 tests.
 889
 890 ```
 891 make "test^quux^step5"
 892 ```
 893
 894 Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to`
 895 function cannot be tail call optimized because it does something after
 896 the recursive call (`sum-to` calls itself and then does the addition).
 897 Lispers say that the `sum-to` is not in tail position. The `sum2`
 898 function however, calls itself from tail position. In other words, the
 899 recursive call to `sum2` is the last action that `sum2` does. Calling
 900 `sum-to` with a large value will cause a stack overflow exception in
 901 most target languages (some have super-special tricks they use to
 902 avoid stack overflows).
 903
 904 Congratulations, your mal implementation already has a feature (TCO)
 905 that most mainstream languages lack.
 906
 907
 908 <a name="step6"></a>
 909
 910 ### Step 6: Files, Mutation, and Evil
 911
 912 ![step6_file architecture](step6_file.png)
 913
 914 In step 5 you added tail call optimization. In this step you will add
 915 some string and file operations and give your implementation a touch
 916 of evil ... er, eval. And as long as your language supports function
 917 closures, this step will be quite simple. However, to complete this
 918 step, you must implement string type support, so if you have been
 919 holding off on that you will need to go back and do so.
 920
 921 Compare the pseudocode for step 5 and step 6 to get a basic idea of
 922 the changes that will be made during this step:
 923 ```
 924 diff -urp ../process/step5_tco.txt ../process/step6_file.txt
 925 ```
 926
 927 * Copy `step5_tco.qx` to `step6_file.qx`.
 928
 929 * Add two new string functions to the core namespaces:
 930   * `read-string`: this function just exposes the `read_str` function
 931     from the reader. If your mal string type is not the same as your
 932     target language (e.g. statically typed language) then your
 933     `read-string` function will need to unbox (extract) the raw string
 934     from the mal string type in order to call `read_str`.
 935   * `slurp`: this function takes a file name (string) and returns the
 936     contents of the file as a string. Once again, if your mal string
 937     type wraps a raw target language string, then you will need to
 938     unmarshall (extract) the string parameter to get the raw file name
 939     string and marshall (wrap) the result back to a mal string type.
 940
 941 * In your main program, add a new symbol "eval" to your REPL
 942   environment. The value of this new entry is a function that takes
 943   a single argument `ast`. The closure calls the your `EVAL` function
 944   using the `ast` as the first argument and the REPL environment
 945   (closed over from outside) as the second argument. The result of
 946   the `EVAL` call is returned. This simple but powerful addition
 947   allows your program to treat mal data as a mal program. For example,
 948   you can now to this:
 949 ```
 950 (def! mal-prog (list + 1 2))
 951 (eval mal-prog)
 952 ```
 953
 954 * Define a `load-file` function using mal itself. In your main
 955   program call the `rep` function with this string:
 956   "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \")\")))))".
 957
 958 Try out `load-file`:
 959   * `(load-file "../tests/incA.mal")` -> `9`
 960   * `(inc4 3)` -> `7`
 961
 962 The `load-file` function does the following:
 963   * Call `slurp` to read in a file by name. Surround the contents with
 964     "(do ...)" so that the whole file will be treated as a single
 965     program AST (abstract syntax tree).
 966   * Call `read-string` on the string returned from `slurp`. This uses
 967     the reader to read/convert the file contents into mal data/AST.
 968   * Call `eval` (the one in the REPL environment) on the AST returned
 969     from `read-string` to "run" it.
 970
 971 Besides adding file and eval support, we'll add support for the atom data type
 972 in this step.  An atom is the Mal way to represent *state*; it is
 973 heavily inspired by [Clojure's atoms](http://clojure.org/state).  An atom holds
 974 a reference to a single Mal value of any type; it supports reading that Mal value
 975 and *modifying* the reference to point to another Mal value.  Note that this is
 976 the only Mal data type that is mutable (but the Mal values it refers to are
 977 still immutable; immutability is explained in greater detail in step 7).
 978 You'll need to add 5 functions to the core namespace to support atoms:
 979
 980   * `atom`: Takes a Mal value and returns a new atom which points to that Mal value.
 981   * `atom?`: Takes an argument and returns `true` if the argument is an atom.
 982   * `deref`: Takes an atom argument and returns the Mal value referenced by this atom.
 983   * `reset!`: Takes an atom and a Mal value; the atom is modified to refer to
 984     the given Mal value. The Mal value is returned.
 985   * `swap!`: Takes an atom, a function, and zero or more function arguments. The
 986     atom's value is modified to the result of applying the function with the atom's
 987     value as the first argument and the optionally given function arguments as
 988     the rest of the arguments. The new atom's value is returned. (Side note: Mal is
 989     single-threaded, but in concurrent languages like Clojure, `swap!` promises
 990     atomic update: `(swap! myatom (fn* [x] (+ 1 x)))` will always increase the
 991     `myatom` counter by one and will not suffer from missing updates when the
 992     atom is updated from multiple threads.)
 993
 994 Optionally, you can add a reader macro `@` which will serve as a short form for
 995 `deref`, so that `@a` is equivalent to `(deref a)`.  In order to do that, modify
 996 the conditional in reader `read_form` function and add a case which deals with
 997 the `@` token: if the token is `@` (at sign) then return a new list that
 998 contains the symbol `deref` and the result of reading the next form
 999 (`read_form`).
1000
1001 Now go to the top level, run the step 6 tests. The optional tests will
1002 need support from the reader for comments, vectors, hash-maps and the `@`
1003 reader macro:
1004 ```
1005 make "test^quux^step6"
1006 ```
1007
1008 Congratulations, you now have a full-fledged scripting language that
1009 can run other mal programs. The `slurp` function loads a file as
1010 a string, the `read-string` function calls the mal reader to turn that
1011 string into data, and the `eval` function takes data and evaluates it
1012 as a normal mal program. However, it is important to note that the
1013 `eval` function is not just for running external programs. Because mal
1014 programs are regular mal data structures, you can dynamically generate
1015 or manipulate those data structures before calling `eval` on them.
1016 This isomorphism (same shape) between data and programs is known as
1017 "homoiconicity". Lisp languages are homoiconic and this property
1018 distinguishes them from most other programming languages.
1019
1020 You mal implementation is quite powerful already but the set of
1021 functions that are available (from `core.qx`) is fairly limited. The
1022 bulk of the functions you will add are described in step 9 and step A,
1023 but you will begin to flesh them out over the next few steps to
1024 support quoting (step 7) and macros (step 8).
1025
1026
1027 #### Deferrable:
1028
1029 * Add the ability to run another mal program from the command line.
1030   Prior to the REPL loop, check if your mal implementation is called
1031   with command line arguments. If so, treat the first argument as
1032   a filename and use `rep` to call `load-file` on that filename, and
1033   finally exit/terminate execution.
1034
1035 * Add the rest of the command line arguments to your REPL environment
1036   so that programs that are run with `load-file` have access to their
1037   calling environment. Add a new "\*ARGV\*" (symbol) entry to your REPL
1038   environment. The value of this entry should be the rest of the
1039   command line arguments as a mal list value.
1040
1041
1042 <a name="step7"></a>
1043
1044 ### Step 7: Quoting
1045
1046 ![step7_quote architecture](step7_quote.png)
1047
1048 In step 7 you will add the special forms `quote` and `quasiquote` and
1049 add supporting core functions `cons` and `concat`. The two quote forms
1050 add a powerful abstraction for manipulating mal code itself
1051 (meta-programming).
1052
1053 The `quote` special form indicates to the evaluator (`EVAL`) that the
1054 parameter should not be evaluated (yet). At first glance, this might
1055 not seem particularly useful but an example of what this enables is the
1056 ability for a mal program to refer to a symbol itself rather than the
1057 value that it evaluates to. Likewise with lists. For example, consider
1058 the following:
1059
1060 * `(prn abc)`: this will lookup the symbol `abc` in the current
1061   evaluation environment and print it. This will result in error if
1062   `abc` is not defined.
1063 * `(prn (quote abc))`: this will print "abc" (prints the symbol
1064   itself). This will work regardless of whether `abc` is defined in
1065   the current environment.
1066 * `(prn (1 2 3))`: this will result in an error because `1` is not
1067   a function and cannot be applied to the arguments `(2 3)`.
1068 * `(prn (quote (1 2 3)))`: this will print "(1 2 3)".
1069 * `(def! l (quote (1 2 3)))`: list quoting allows us to define lists
1070   directly in the code (list literal). Another way of doing this is
1071   with the list function: `(def! l (list 1 2 3))`.
1072
1073 The second special quoting form is `quasiquote`. This allows a quoted
1074 list to have internal elements of the list that are temporarily
1075 unquoted (normal evaluation). There are two special forms that only
1076 mean something within a quasiquoted list: `unquote` and
1077 `splice-unquote`. These are perhaps best explained with some examples:
1078
1079 * `(def! lst (quote (2 3)))` -> `(2 3)`
1080 * `(quasiquote (1 (unquote lst)))` -> `(1 (2 3))`
1081 * `(quasiquote (1 (splice-unquote lst)))` -> `(1 2 3)`
1082
1083 The `unquote` form turns evaluation back on for its argument and the
1084 result of evaluation is put in place into the quasiquoted list. The
1085 `splice-unquote` also turns evaluation back on for its argument, but
1086 the evaluated value must be a list which is then "spliced" into the
1087 quasiquoted list. The true power of the quasiquote form will be
1088 manifest when it is used together with macros (in the next step).
1089
1090 Compare the pseudocode for step 6 and step 7 to get a basic idea of
1091 the changes that will be made during this step:
1092 ```
1093 diff -urp ../process/step6_file.txt ../process/step7_quote.txt
1094 ```
1095
1096 * Copy `step6_file.qx` to `step7_quote.qx`.
1097
1098 * Before implementing the quoting forms, you will need to implement
1099 * some supporting functions in the core namespace:
1100   * `cons`: this function takes a list as its second
1101     parameter and returns a new list that has the first argument
1102     prepended to it.
1103   * `concat`: this functions takes 0 or more lists as
1104     parameters and returns a new list that is a concatenation of all
1105     the list parameters.
1106
1107 An aside on immutability: note that neither cons or concat mutate
1108 their original list arguments. Any references to them (i.e. other
1109 lists that they may be "contained" in) will still refer to the
1110 original unchanged value. Mal, like Clojure, is a language which uses
1111 immutable data structures. I encourage you to read about the power and
1112 importance of immutability as implemented in Clojure (from which
1113 Mal borrows most of its syntax and feature-set).
1114
1115 * Add the `quote` special form. This form just returns its argument
1116   (the second list element of `ast`).
1117
1118 * Add the `quasiquote` special form. First implement a helper function
1119   `is_pair` that returns true if the parameter is a non-empty list.
1120   Then define a `quasiquote` function. This is called from `EVAL` with
1121   the first `ast` argument (second list element) and then `ast` is set
1122   to the result and execution continues at the top of the loop (TCO).
1123   The `quasiquote` function takes a parameter `ast` and has the
1124   following conditional:
1125   1. if `is_pair` of `ast` is false: return a new list containing:
1126      a symbol named "quote" and `ast`.
1127   2. else if the first element of `ast` is a symbol named "unquote":
1128      return the second element of `ast`.
1129   3. if `is_pair` of the first element of `ast` is true and the first
1130      element of first element of `ast` (`ast[0][0]`) is a symbol named
1131      "splice-unquote": return a new list containing: a symbol named
1132      "concat", the second element of first element of `ast`
1133      (`ast[0][1]`), and the result of calling `quasiquote` with the
1134      second through last element of `ast`.
1135   4. otherwise: return a new list containing: a symbol named "cons", the
1136      result of calling `quasiquote` on first element of `ast`
1137      (`ast[0]`), and the result of calling `quasiquote` with the second
1138      through last element of `ast`.
1139
1140
1141 Now go to the top level, run the step 7 tests:
1142 ```
1143 make "test^quux^step7"
1144 ```
1145
1146 Quoting is one of the more mundane functions available in mal, but do
1147 not let that discourage you. Your mal implementation is almost
1148 complete, and quoting sets the stage for the next very exiting step:
1149 macros.
1150
1151
1152 #### Deferrable
1153
1154 * The full names for the quoting forms are fairly verbose. Most Lisp
1155   languages have a short-hand syntax and Mal is no exception. These
1156   short-hand syntaxes are known as reader macros because they allow us
1157   to manipulate mal code during the reader phase. Macros that run
1158   during the eval phase are just called "macros" and are described in
1159   the next section. Expand the conditional with reader `read_form`
1160   function to add the following four cases:
1161   * token is "'" (single quote): return a new list that contains the
1162     symbol "quote" and the result of reading the next form
1163     (`read_form`).
1164   * token is "\`" (back-tick): return a new list that contains the
1165     symbol "quasiquote" and the result of reading the next form
1166     (`read_form`).
1167   * token is "~" (tilde): return a new list that contains the
1168     symbol "unquote" and the result of reading the next form
1169     (`read_form`).
1170   * token is "~@" (tilde + at sign): return a new list that contains
1171     the symbol "splice-unquote" and the result of reading the next
1172     form (`read_form`).
1173
1174 * Add support for quoting of vectors. The `is_pair` function should
1175   return true if the argument is a non-empty list or vector. `cons`
1176   should also accept a vector as the second argument. The return value
1177   is a list regardless. `concat` should support concatenation of
1178   lists, vectors, or a mix or both. The result is always a list.
1179
1180
1181 <a name="step8"></a>
1182
1183 ### Step 8: Macros
1184
1185 ![step8_macros architecture](step8_macros.png)
1186
1187 Your mal implementation is now ready for one of the most lispy and
1188 exciting of all programming concepts: macros. In the previous step,
1189 quoting enabled some simple manipulation data structures and therefore
1190 manipulation of mal code (because the `eval` function from step
1191 6 turns mal data into code). In this step you will be able to mark mal
1192 functions as macros which can manipulate mal code before it is
1193 evaluated. In other words, macros are user-defined special forms. Or
1194 to look at it another way, macros allow mal programs to redefine
1195 the mal language itself.
1196
1197 Compare the pseudocode for step 7 and step 8 to get a basic idea of
1198 the changes that will be made during this step:
1199 ```
1200 diff -urp ../process/step7_quote.txt ../process/step8_macros.txt
1201 ```
1202
1203 * Copy `step7_quote.qx` to `step8_macros.qx`.
1204
1205
1206 You might think that the infinite power of macros would require some
1207 sort of complex mechanism, but the implementation is actually fairly
1208 simple.
1209
1210 * Add a new attribute `is_macro` to mal function types. This should
1211   default to false.
1212
1213 * Add a new special form `defmacro!`. This is very similar to the
1214   `def!` form, but before the evaluated value (mal function) is set in
1215   the environment, the `is_macro` attribute should be set to true.
1216
1217 * Add a `is_macro_call` function: This function takes arguments `ast`
1218   and `env`. It returns true if `ast` is a list that contains a symbol
1219   as the first element and that symbol refers to a function in the
1220   `env` environment and that function has the `is_macro` attribute set
1221   to true. Otherwise, it returns false.
1222
1223 * Add a `macroexpand` function: This function takes arguments `ast`
1224   and `env`. It calls `is_macro_call` with `ast` and `env` and loops
1225   while that condition is true. Inside the loop, the first element of
1226   the `ast` list (a symbol), is looked up in the environment to get
1227   the macro function. This macro function is then called/applied with
1228   the rest of the `ast` elements (2nd through the last) as arguments.
1229   The return value of the macro call becomes the new value of `ast`.
1230   When the loop completes because `ast` no longer represents a macro
1231   call, the current value of `ast` is returned.
1232
1233 * In the evaluator (`EVAL`) before the special forms switch (apply
1234   section), perform macro expansion by calling the `macroexpand`
1235   function with the current value of `ast` and `env`. Set `ast` to the
1236   result of that call. If the new value of `ast` is no longer a list
1237   after macro expansion, then return the result of calling `eval_ast`
1238   on it, otherwise continue with the rest of the apply section
1239   (special forms switch).
1240
1241 * Add a new special form condition for `macroexpand`. Call the
1242   `macroexpand` function using the first `ast` argument (second list
1243   element) and `env`. Return the result. This special form allows
1244   a mal program to do explicit macro expansion without applying the
1245   result (which can be useful for debugging macro expansion).
1246
1247 Now go to the top level, run the step 8 tests:
1248 ```
1249 make "test^quux^step8"
1250 ```
1251
1252 There is a reasonably good chance that the macro tests will not pass
1253 the first time. Although the implementation of macros is fairly
1254 simple, debugging runtime bugs with macros can be fairly tricky. If
1255 you do run into subtle problems that are difficult to solve, let me
1256 recommend a couple of approaches:
1257
1258 * Use the macroexpand special form to eliminate one of the layers of
1259   indirection (to expand but skip evaluate). This will often reveal
1260   the source of the issue.
1261 * Add a debug print statement to the top of your main `eval` function
1262   (inside the TCO loop) to print the current value of `ast` (hint use
1263   `pr_str` to get easier to debug output). Pull up the step8
1264   implementation from another language and uncomment its `eval`
1265   function (yes, I give you permission to violate the rule this once).
1266   Run the two side-by-side. The first difference is likely to point to
1267   the bug.
1268
1269 Congratulations! You now have a Lisp interpreter with a super power
1270 that most non-Lisp languages can only dream of (I have it on good
1271 authority that languages dream when you are not using them). If you
1272 are not already familiar with Lisp macros, I suggest the following
1273 exercise: write a recursive macro that handles postfixed mal code
1274 (with the function as the last parameter instead of the first). Or
1275 not. I have not actually done so myself, but I have heard it is an
1276 interesting exercise.
1277
1278 In the next step you will add try/catch style exception handling to
1279 your implementation in addition to some new core functions. After
1280 step9 you will be very close to having a fully self-hosting mal
1281 implementation. Let us continue!
1282
1283
1284 #### Deferrable
1285
1286 * Add the following new core functions which are frequently used in
1287   macro functions:
1288   * `nth`: this function takes a list (or vector) and a number (index)
1289     as arguments, returns the element of the list at the given index.
1290     If the index is out of range, this function raises an exception.
1291   * `first`: this function takes a list (or vector) as its argument
1292     and return the first element. If the list (or vector) is empty or
1293     is `nil` then `nil` is returned.
1294   * `rest`: this function takes a list (or vector) as its argument and
1295     returns a new list containing all the elements except the first.
1296
1297 * In the main program, use the `rep` function to define two new
1298   control structures macros. Here are the string arguments for `rep`
1299   to define these macros:
1300   * `cond`: "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))"
1301   * `or`: "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) `(let* (or_FIXME ~(first xs)) (if or_FIXME or_FIXME (or ~@(rest xs))))))))"
1302
1303
1304 <a name="step9"></a>
1305
1306 ### Step 9: Try
1307
1308 ![step9_try architecture](step9_try.png)
1309
1310 In this step you will implement the final mal special form for
1311 error/exception handling: `try*/catch*`. You will also add several core
1312 functions to your implementation. In particular, you will enhance the
1313 functional programming pedigree of your implementation by adding the
1314 `apply` and `map` core functions.
1315
1316 Compare the pseudocode for step 8 and step 9 to get a basic idea of
1317 the changes that will be made during this step:
1318 ```
1319 diff -urp ../process/step8_macros.txt ../process/step9_try.txt
1320 ```
1321
1322 * Copy `step8_macros.qx` to `step9_try.qx`.
1323
1324 * Add the `try*/catch*` special form to the EVAL function. The
1325   try catch form looks like this: `(try* A (catch* B C))`. The form
1326   `A` is evaluated, if it throws an exception, then form `C` is
1327   evaluated with a new environment that binds the symbol `B` to the
1328   value of the exception that was thrown.
1329   * If your target language has built-in try/catch style exception
1330     handling then you are already 90% of the way done. Add a
1331     (native language) try/catch block that evaluates `A` within
1332     the try block and catches all exceptions. If an exception is
1333     caught, then translate it to a mal type/value. For native
1334     exceptions this is either the message string or a mal hash-map
1335     that contains the message string and other attributes of the
1336     exception. When a regular mal type/value is used as an
1337     exception, you will probably need to store it within a native
1338     exception type in order to be able to convey/transport it using
1339     the native try/catch mechanism. Then you will extract the mal
1340     type/value from the native exception. Create a new mal environment
1341     that binds `B` to the value of the exception. Finally, evaluate `C`
1342     using that new environment.
1343   * If your target language does not have built-in try/catch style
1344     exception handling then you have some extra work to do. One of the
1345     most straightforward approaches is to create a a global error
1346     variable that stores the thrown mal type/value. The complication
1347     is that there are a bunch of places where you must check to see if
1348     the global error state is set and return without proceeding. The
1349     rule of thumb is that this check should happen at the top of your
1350     EVAL function and also right after any call to EVAL (and after any
1351     function call that might happen to call EVAL further down the
1352     chain). Yes, it is ugly, but you were warned in the section on
1353     picking a language.
1354
1355 * Add the `throw` core function.
1356   * If your language supports try/catch style exception handling, then
1357     this function takes a mal type/value and throws/raises it as an
1358     exception. In order to do this, you may need to create a custom
1359     exception object that wraps a mal value/type.
1360   * If your language does not support try/catch style exception
1361     handling, then set the global error state to the mal type/value.
1362
1363 * Add the `apply` and `map` core functions. In step 5, if you did not
1364   add the original function (`fn`) to the structure returned from
1365   `fn*`, the you will need to do so now.
1366   * `apply`: takes at least two arguments. The first argument is
1367     a function and the last argument is list (or vector). The
1368     arguments between the function and the last argument (if there are
1369     any) are concatenated with the final argument to create the
1370     arguments that are used to call the function. The apply
1371     function allows a function to be called with arguments that are
1372     contained in a list (or vector). In other words, `(apply F A B [C
1373     D])` is equivalent to `(F A B C D)`.
1374   * `map`: takes a function and a list (or vector) and evaluates the
1375     function against every element of the list (or vector) one at
1376     a time and returns the results as a list.
1377
1378 * Add some type predicates core functions. In Lisp, predicates are
1379   functions that return true/false (or true value/nil) and typically
1380   end in "?" or "p".
1381   * `nil?`: takes a single argument and returns true (mal true value)
1382     if the argument is nil (mal nil value).
1383   * `true?`: takes a single argument and returns true (mal true value)
1384     if the argument is a true value (mal true value).
1385   * `false?`: takes a single argument and returns true (mal true
1386     value) if the argument is a false value (mal false value).
1387   * `symbol?`: takes a single argument and returns true (mal true
1388     value) if the argument is a symbol (mal symbol value).
1389
1390 Now go to the top level, run the step 9 tests:
1391 ```
1392 make "test^quux^step9"
1393 ```
1394
1395 Your mal implementation is now essentially a fully featured Lisp
1396 interpreter. But if you stop now you will miss one of the most
1397 satisfying and enlightening aspects of creating a mal implementation:
1398 self-hosting.
1399
1400 #### Deferrable
1401
1402 * Add the following new core functions:
1403   * `symbol`: takes a string and returns a new symbol with the string
1404     as its name.
1405   * `keyword`: takes a string and returns a keyword with the same name
1406     (usually just be prepending the special keyword
1407     unicode symbol). This function should also detect if the argument
1408     is already a keyword and just return it.
1409   * `keyword?`: takes a single argument and returns true (mal true
1410     value) if the argument is a keyword, otherwise returns false (mal
1411     false value).
1412   * `vector`: takes a variable number of arguments and returns
1413     a vector containing those arguments.
1414   * `vector?`: takes a single argument and returns true (mal true
1415     value) if the argument is a vector, otherwise returns false (mal
1416     false value).
1417   * `hash-map`: takes a variable but even number of arguments and
1418     returns a new mal hash-map value with keys from the odd arguments
1419     and values from the even arguments respectively. This is basically
1420     the functional form of the `{}` reader literal syntax.
1421   * `map?`: takes a single argument and returns true (mal true
1422     value) if the argument is a hash-map, otherwise returns false (mal
1423     false value).
1424   * `assoc`: takes a hash-map as the first argument and the remaining
1425     arguments are odd/even key/value pairs to "associate" (merge) into
1426     the hash-map. Note that the original hash-map is unchanged
1427     (remember, mal values are immutable), and a new hash-map
1428     containing the old hash-maps key/values plus the merged key/value
1429     arguments is returned.
1430   * `dissoc`: takes a hash-map and a list of keys to remove from the
1431     hash-map. Again, note that the original hash-map is unchanged and
1432     a new hash-map with the keys removed is returned. Key arguments
1433     that do not exist in the hash-map are ignored.
1434   * `get`: takes a hash-map and a key and returns the value of looking
1435     up that key in the hash-map. If the key is not found in the
1436     hash-map then nil is returned.
1437   * `contains?`: takes a hash-map and a key and returns true (mal true
1438     value) if the key exists in the hash-map and false (mal false
1439     value) otherwise.
1440   * `keys`: takes a hash-map and returns a list (mal list value) of
1441     all the keys in the hash-map.
1442   * `vals`: takes a hash-map and returns a list (mal list value) of
1443     all the values in the hash-map.
1444   * `sequential?`: takes a single arguments and returns true (mal true
1445     value) if it is a list or a vector, otherwise returns false (mal
1446     false value).
1447
1448
1449 <a name="stepA"></a>
1450
1451 ### Step A: Metadata, Self-hosting and Interop
1452
1453 ![stepA_mal architecture](stepA_mal.png)
1454
1455 You have reached the final step of your mal implementation. This step
1456 is kind of a catchall for things that did not fit into other steps.
1457 But most importantly, the changes you make in this step will unlock
1458 the magical power known as "self-hosting". You might have noticed
1459 that one of the languages that mal is implemented in is "mal". Any mal
1460 implementation that is complete enough can run the mal implementation
1461 of mal. You might need to pull out your hammock and ponder this for
1462 a while if you have never built a compiler or interpreter before. Look
1463 at the step source files for the mal implementation of mal (it is not
1464 cheating now that you have reached step A).
1465
1466 If you deferred the implementation of keywords, vectors and hash-maps,
1467 now is the time to go back and implement them if you want your
1468 implementation to self-host.
1469
1470 Compare the pseudocode for step 9 and step A to get a basic idea of
1471 the changes that will be made during this step:
1472 ```
1473 diff -urp ../process/step9_try.txt ../process/stepA_mal.txt
1474 ```
1475
1476 * Copy `step9_try.qx` to `stepA_mal.qx`.
1477
1478 * Add the `readline` core function. This functions takes a
1479   string that is used to prompt the user for input. The line of text
1480   entered by the user is returned as a string. If the user sends an
1481   end-of-file (usually Ctrl-D), then nil is returned.
1482
1483 * Add meta-data support to mal functions. TODO. Should be separate
1484   from the function macro flag.
1485
1486 * Add a new "\*host-language\*" (symbol) entry to your REPL
1487   environment. The value of this entry should be a mal string
1488   containing thename of the current implementation.
1489
1490 * When the REPL starts up (as opposed to when it is called with
1491   a script and/or arguments), call the `rep` function with this string
1492   to print a startup header:
1493   "(println (str \"Mal [\" *host-language* \"]\"))".
1494
1495
1496 Now go to the top level, run the step A tests:
1497 ```
1498 make "test^quux^stepA"
1499 ```
1500
1501 Once you have passed all the non-optional step A tests, it is time to
1502 try self-hosting. Run your step A implementation as normal, but use
1503 the file argument mode you added in step 6 to run a each of the step
1504 from the mal implementation:
1505 ```
1506 ./stepA_mal.qx ../mal/step1_read_print.mal
1507 ./stepA_mal.qx ../mal/step2_eval.mal
1508 ...
1509 ./stepA_mal.qx ../mal/step9_try.mal
1510 ./stepA_mal.qx ../mal/stepA_mal.mal
1511 ```
1512
1513 There is a very good chance that you will encounter an error at some
1514 point while trying to run the mal in mal implementation steps above.
1515 Debugging failures that happen while self-hosting is MUCH more
1516 difficult and mind bending. One of the best approaches I have
1517 personally found is to add prn statements to the mal implementation
1518 step (not your own implementation of mal) that is causing problems.
1519
1520 Another approach I have frequently used is to pull out the code from
1521 the mal implementation that is causing the problem and simplify it
1522 step by step until you have a simple piece of mal code that still
1523 reproduces the problem. Once the reproducer is simple enough you will
1524 probably know where in your own implementation that problem is likely
1525 to be. Please add your simple reproducer as a test case so that future
1526 implementers will fix similar issues in their code before they get to
1527 self-hosting when it is much more difficult to track down and fix.
1528
1529 Once you can manually run all the self-hosted steps, it is time to run
1530 all the tests in self-hosted mode:
1531 ```
1532 make MAL_IMPL=quux "test^mal"
1533 ```
1534
1535 When you run into problems (which you almost certainly will), use the
1536 same process described above to debug them.
1537
1538 Congratulations!!! When all the tests pass, you should pause for
1539 a moment and consider what you have accomplished. You have implemented
1540 a Lisp interpreter that is powerful and complete enough to run a large
1541 mal program which is itself an implementation of the mal language. You
1542 might even be asking if you can continue the "inception" by using your
1543 implementation to run a mal implementation which itself runs the mal
1544 implementation.
1545
1546
1547 #### Optional: gensym
1548
1549 The `or` macro we introduced at step 8 has a bug. It defines a
1550 variable called `or_FIXME`, which "shadows" such a binding from the
1551 user's code (which uses the macro). If a user has a variable called
1552 `or_FIXME`, it cannot be used as an `or` macro argument. In order to
1553 fix that, we'll introduce `gensym`: a function which returns a symbol
1554 which was never used before anywhere in the program. This is also an
1555 example for the use of mal atoms to keep state (the state here being
1556 the number of symbols produced by `gensym` so far).
1557
1558 Previously you used `rep` to define the `or` macro. Remove that
1559 definition and use `rep` to define the new counter, `gensym` function
1560 and the clean `or` macro. Here are the string arguments you need to
1561 pass to `rep`:
1562 ```
1563 "(def! *gensym-counter* (atom 0))"
1564
1565 "(def! gensym (fn* [] (symbol (str \"G__\" (swap! *gensym-counter* (fn* [x] (+ 1 x)))))))"
1566
1567 "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) (let* (condvar (gensym)) `(let* (~condvar ~(first xs)) (if ~condvar ~condvar (or ~@(rest xs)))))))))"
1568 ```
1569
1570 For extra information read [Peter Seibel's thorough discussion about
1571 `gensym` and leaking macros in Common Lisp](http://www.gigamonkeys.com/book/macros-defining-your-own.html#plugging-the-leaks).
1572
1573
1574 #### Optional additions
1575
1576 * Add metadata support to composite data types, symbols and native
1577   functions. TODO
1578 * Add the following new core functions:
1579   * `time-ms`: takes no arguments and returns the number of
1580     milliseconds since epoch (00:00:00 UTC January 1, 1970), or, if
1581     not possible, since another point in time (`time-ms` is usually
1582     used relatively to measure time durations).  After `time-ms` is
1583     implemented, you can run the mal implementation performance
1584     benchmarks by running `make perf^quux`.
1585   * `conj`: takes a collection and one or more elements as arguments
1586     and returns a new collection which includes the original
1587     collection and the new elements.  If the collection is a list, a
1588     new list is returned with the elements inserted at the start of
1589     the given list in opposite order; if the collection is a vector, a
1590     new vector is returned with the elements added to the end of the
1591     given vector.
1592   * `string?`: returns true if the parameter is a string.
1593   * `seq`: takes a list, vector, string, or nil. If an empty list,
1594     empty vector, or empty string ("") is passed in then nil is
1595     returned. Otherwise, a list is returned unchanged, a vector is
1596     converted into a list, and a string is converted to a list that
1597     containing the original string split into single character
1598     strings.
1599
1600
1601 ## TODO:
1602
1603 * simplify: "X argument (list element Y)" -> ast[Y]
1604 * list of types with metadata: list, vector, hash-map, mal functions
1605 * more clarity about when to peek and poke in read_list and read_form
1606 * tokenizer: use first group rather than whole match (to eliminate
1607   whitespace/commas)