| 1 | # The Make-A-Lisp Process |
| 2 | |
| 3 | So you want to write a Lisp interpreter? Welcome! |
| 4 | |
| 5 | The goal of the Make-A-Lisp project is to make it easy to write your |
| 6 | own Lisp interpreter without sacrificing those many "Aha!" moments |
| 7 | that come from ascending the McCarthy mountain. When you reach the peak |
| 8 | of this particular mountain, you will have an interpreter for the mal |
| 9 | Lisp language that is powerful enough to be self-hosting, meaning it |
| 10 | will be able to run a mal interpreter written in mal itself. |
| 11 | |
| 12 | So jump right in (er ... start the climb)! |
| 13 | |
| 14 | - [Pick a language](#pick-a-language) |
| 15 | - [Getting started](#getting-started) |
| 16 | - [General hints](#general-hints) |
| 17 | - [The Make-A-Lisp Process](#the-make-a-lisp-process-1) |
| 18 | - [Step 0: The REPL](#step-0-the-repl) |
| 19 | - [Step 1: Read and Print](#step-1-read-and-print) |
| 20 | - [Step 2: Eval](#step-2-eval) |
| 21 | - [Step 3: Environments](#step-3-environments) |
| 22 | - [Step 4: If Fn Do](#step-4-if-fn-do) |
| 23 | - [Step 5: Tail call optimization](#step-5-tail-call-optimization) |
| 24 | - [Step 6: Files, Mutation, and Evil](#step-6-files-mutation-and-evil) |
| 25 | - [Step 7: Quoting](#step-7-quoting) |
| 26 | - [Step 8: Macros](#step-8-macros) |
| 27 | - [Step 9: Try](#step-9-try) |
| 28 | - [Step A: Metadata, Self-hosting and Interop](#step-a-metadata-self-hosting-and-interop) |
| 29 | |
| 30 | |
| 31 | ## Pick a language |
| 32 | |
| 33 | You might already have a language in mind that you want to use. |
| 34 | Technically speaking, mal can be implemented in any sufficiently |
| 35 | complete programming language (i.e. Turing complete), however, there |
| 36 | are a few language features that can make the task MUCH easier. Here |
| 37 | are some of them in rough order of importance: |
| 38 | |
| 39 | * A sequential compound data structure (e.g. arrays, lists, |
| 40 | vectors, etc) |
| 41 | * An associative compound data structure (e.g. a dictionary, |
| 42 | hash-map, associative array, etc) |
| 43 | * Function references (first class functions, function pointers, |
| 44 | etc) |
| 45 | * Real exception handling (try/catch, raise, throw, etc) |
| 46 | * Variable argument functions (variadic, var args, splats, apply, etc) |
| 47 | * Function closures |
| 48 | * PCRE regular expressions |
| 49 | |
| 50 | In addition, the following will make your task especially easy: |
| 51 | |
| 52 | * Dynamic typing / boxed types (specifically, the ability to store |
| 53 | different data types in the sequential and associative structures |
| 54 | and the language keeps track of the type for you) |
| 55 | * Compound data types support arbitrary runtime "hidden" data |
| 56 | (metadata, metatables, dynamic fields attributes) |
| 57 | |
| 58 | Here are some examples of languages that have all of the above |
| 59 | features: JavaScript, Ruby, Python, Lua, R, Clojure. |
| 60 | |
| 61 | Michael Fogus has some great blog posts on interesting but less well |
| 62 | known languages and many of the languages on his lists do not yet have |
| 63 | any mal implementations: |
| 64 | * http://blog.fogus.me/2011/08/14/perlis-languages/ |
| 65 | * http://blog.fogus.me/2011/10/18/programming-language-development-the-past-5-years/ |
| 66 | |
| 67 | Many of the most popular languages already have Mal implementations. |
| 68 | However, this should not discourage you from creating your own |
| 69 | implementation in a language that already has one. However, if you go |
| 70 | this route, I suggest you avoid referring to the existing |
| 71 | implementations (i.e. "cheating") to maximize your learning experience |
| 72 | instead of just borrowing mine. On the other hand, if your goal is to |
| 73 | add new implementations to mal as efficiently as possible, then you |
| 74 | SHOULD find the most similar target language implementation and refer |
| 75 | to it frequently. |
| 76 | |
| 77 | If you want a list of programming languages with an |
| 78 | approximate measure of popularity try the [RedMonk Programming |
| 79 | Language |
| 80 | Rankings](https://redmonk.com/sogrady/2019/03/20/language-rankings-1-19/) |
| 81 | or the [GitHut 2.0 Project](https://madnight.github.io/githut). |
| 82 | |
| 83 | |
| 84 | ## Getting started |
| 85 | |
| 86 | * Install your chosen language interpreter/compiler, language package |
| 87 | manager and build tools (if applicable) |
| 88 | |
| 89 | * Fork the mal repository on github and then clone your forked |
| 90 | repository: |
| 91 | ``` |
| 92 | git clone git@github.com:YOUR_NAME/mal.git |
| 93 | cd mal |
| 94 | ``` |
| 95 | |
| 96 | * Make a new directory for your implementation. For example, if your |
| 97 | language is called "quux": |
| 98 | ``` |
| 99 | mkdir quux |
| 100 | ``` |
| 101 | |
| 102 | * Modify the top level Makefile to allow the tests to be run against |
| 103 | your implementation. For example, if your language is named "quux" |
| 104 | and uses "qx" as the file extension, then make the following |
| 105 | 3 modifications to Makefile: |
| 106 | ``` |
| 107 | IMPLS = ... quux ... |
| 108 | ... |
| 109 | quux_STEP_TO_PROG = mylang/$($(1)).qx |
| 110 | ``` |
| 111 | |
| 112 | * Add a "run" script to your implementation directory that listens to |
| 113 | the "STEP" environment variable for the implementation step to run |
| 114 | and defaults to "stepA_mal". Make sure the run script has the |
| 115 | executable file permission set (or else the test runner might fail with a |
| 116 | permission denied error message). The following are examples of "run" |
| 117 | scripts for a compiled language and an interpreted language (where |
| 118 | the interpreter is named "quux"): |
| 119 | |
| 120 | ``` |
| 121 | #!/bin/bash |
| 122 | exec $(dirname $0)/${STEP:-stepA_mal} "${@}" |
| 123 | ``` |
| 124 | |
| 125 | ``` |
| 126 | #!/bin/bash |
| 127 | exec quux $(dirname $0)/${STEP:-stepA_mal}.qx "${@}" |
| 128 | ``` |
| 129 | |
| 130 | This allows you to run tests against your implementation like this: |
| 131 | ``` |
| 132 | make "test^quux^stepX" |
| 133 | ``` |
| 134 | |
| 135 | If your implementation language is a compiled language, then you |
| 136 | should also add a Makefile at the top level of your implementation |
| 137 | directory. This Makefile will define how to build the files pointed to |
| 138 | by the quux_STEP_TO_PROG macro. The top-level Makefile will attempt to |
| 139 | build those targets before running tests. If it is a scripting |
| 140 | language/uncompiled, then no Makefile is necessary because |
| 141 | quux_STEP_TO_PROG will point to a source file that already exists and |
| 142 | does not need to be compiled/built. |
| 143 | |
| 144 | |
| 145 | ## General hints |
| 146 | |
| 147 | Stackoverflow and Google are your best friends. Modern polyglot |
| 148 | developers do not memorize dozens of programming languages. Instead, |
| 149 | they learn the peculiar terminology used with each language and then |
| 150 | use this to search for their answers. |
| 151 | |
| 152 | Here are some other resources where multiple languages are |
| 153 | compared/described: |
| 154 | * http://learnxinyminutes.com/ |
| 155 | * http://hyperpolyglot.org/ |
| 156 | * http://rosettacode.org/ |
| 157 | * http://rigaux.org/language-study/syntax-across-languages/ |
| 158 | |
| 159 | Do not let yourself be bogged down by specific problems. While the |
| 160 | make-a-lisp process is structured as a series of steps, the reality is |
| 161 | that building a lisp interpreter is more like a branching tree. If you |
| 162 | get stuck on tail call optimization, or hash-maps, move on to other |
| 163 | things. You will often have a stroke of inspiration for a problem as |
| 164 | you work through other functionality. I have tried to structure this |
| 165 | guide and the tests to make clear which things can be deferred until |
| 166 | later. |
| 167 | |
| 168 | An aside on deferrable/optional bits: when you run the tests for |
| 169 | a given step, the last tests are often marked with an "optional" |
| 170 | header. This indicates that these are tests for functionality that is |
| 171 | not critical to finish a basic mal implementation. Many of the steps |
| 172 | in this process guide have a "Deferrable" section, however, it is not |
| 173 | quite the same meaning. Those sections include the functionality that |
| 174 | is marked as optional in the tests, but they also include |
| 175 | functionality that becomes mandatory at a later step. In other words, |
| 176 | this is a "make your own Lisp adventure". |
| 177 | |
| 178 | Use test driven development. Each step of the make-a-lisp process has |
| 179 | a bunch of tests associated with it and there is an easy script to run |
| 180 | all the tests for a specific step in the process. Pick a failing test, |
| 181 | fix it, repeat until all the tests for that step pass. |
| 182 | |
| 183 | ## Reference Code |
| 184 | |
| 185 | The `process` directory contains abbreviated pseudocode and |
| 186 | architecture diagrams for each step of the make-a-lisp process. Use |
| 187 | a textual diff/comparison tool to compare the previous pseudocode step |
| 188 | with the one you are working on. The architecture diagram images have |
| 189 | changes from the previous step highlighted in red. There is also |
| 190 | a concise |
| 191 | [cheatsheet](http://kanaka.github.io/mal/cheatsheet.html) that |
| 192 | summarizes the key changes at each step. |
| 193 | |
| 194 | If you get completely stuck and are feeling like giving up, then you |
| 195 | should "cheat" by referring to the same step or functionality in |
| 196 | a existing implementation language. You are here to learn, not to take |
| 197 | a test, so do not feel bad about it. Okay, you should feel a little |
| 198 | bit bad about it. |
| 199 | |
| 200 | |
| 201 | ## The Make-A-Lisp Process |
| 202 | |
| 203 | Feel free to follow the guide as literally or as loosely as you |
| 204 | like. You are here to learn; wandering off the beaten path may be the |
| 205 | way you learn best. However, each step builds on the previous steps, |
| 206 | so if you are new to Lisp or new to your implementation language then |
| 207 | you may want to stick more closely to the guide your first time |
| 208 | through to avoid frustration at later steps. |
| 209 | |
| 210 | In the steps that follow the name of the target language is "quux" and |
| 211 | the file extension for that language is "qx". |
| 212 | |
| 213 | |
| 214 | <a name="step0"></a> |
| 215 | |
| 216 | ### Step 0: The REPL |
| 217 | |
| 218 | ![step0_repl architecture](step0_repl.png) |
| 219 | |
| 220 | This step is basically just creating a skeleton of your interpreter. |
| 221 | |
| 222 | * Create a `step0_repl.qx` file in `quux/`. |
| 223 | |
| 224 | * Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep` |
| 225 | (read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just |
| 226 | stubs that return their first parameter (a string if your target |
| 227 | language is a statically typed) and `rep` calls them in order |
| 228 | passing the return to the input of the next. |
| 229 | |
| 230 | * Add a main loop that repeatedly prints a prompt (needs to be |
| 231 | "user> " for later tests to pass), gets a line of input from the |
| 232 | user, calls `rep` with that line of input, and then prints out the |
| 233 | result from `rep`. It should also exit when you send it an EOF |
| 234 | (often Ctrl-D). |
| 235 | |
| 236 | * If you are using a compiled (ahead-of-time rather than just-in-time) |
| 237 | language, then create a Makefile (or appropriate project definition |
| 238 | file) in your directory. |
| 239 | |
| 240 | It is time to run your first tests. This will check that your program |
| 241 | does input and output in a way that can be captured by the test |
| 242 | harness. Go to the top level and run the following: |
| 243 | ``` |
| 244 | make "test^quux^step0" |
| 245 | ``` |
| 246 | |
| 247 | Add and then commit your new `step0_repl.qx` and `Makefile` to git. |
| 248 | |
| 249 | Congratulations! You have just completed the first step of the |
| 250 | make-a-lisp process. |
| 251 | |
| 252 | |
| 253 | #### Optional: |
| 254 | |
| 255 | * Add full line editing and command history support to your |
| 256 | interpreter REPL. Many languages have a library/module that provide |
| 257 | line editing support. Another option if your language supports it is |
| 258 | to use an FFI (foreign function interface) to load and call directly |
| 259 | into GNU readline, editline, or linenoise library. Add line |
| 260 | editing interface code to `readline.qx` |
| 261 | |
| 262 | |
| 263 | <a name="step1"></a> |
| 264 | |
| 265 | ### Step 1: Read and Print |
| 266 | |
| 267 | ![step1_read_print architecture](step1_read_print.png) |
| 268 | |
| 269 | In this step, your interpreter will "read" the string from the user |
| 270 | and parse it into an internal tree data structure (an abstract syntax |
| 271 | tree) and then take that data structure and "print" it back to |
| 272 | a string. |
| 273 | |
| 274 | In non-lisp languages, this step (called "lexing and parsing") can be |
| 275 | one of the most complicated parts of the compiler/interpreter. In |
| 276 | Lisp, the data structure that you want in memory is basically |
| 277 | represented directly in the code that the programmer writes |
| 278 | (homoiconicity). |
| 279 | |
| 280 | For example, if the string is "(+ 2 (* 3 4))" then the read function |
| 281 | will process this into a tree structure that looks like this: |
| 282 | ``` |
| 283 | List |
| 284 | / | \ |
| 285 | / | \ |
| 286 | / | \ |
| 287 | Sym:+ Int:2 List |
| 288 | / | \ |
| 289 | / | \ |
| 290 | / | \ |
| 291 | Sym:* Int:3 Int:4 |
| 292 | ``` |
| 293 | |
| 294 | Each left paren and its matching right paren (lisp "sexpr") becomes |
| 295 | a node in the tree and everything else becomes a leaf in the tree. |
| 296 | |
| 297 | If you can find code for an implementation of a JSON encoder/decoder |
| 298 | in your target language then you can probably just borrow and modify |
| 299 | that and be 75% of the way done with this step. |
| 300 | |
| 301 | The rest of this section is going to assume that you are not starting |
| 302 | from an existing JSON encoder/decoder, but that you do have access to |
| 303 | a Perl compatible regular expressions (PCRE) module/library. You can |
| 304 | certainly implement the reader using simple string operations, but it |
| 305 | is more involved. The `make`, `ps` (postscript) and Haskell |
| 306 | implementations have examples of a reader/parser without using regular |
| 307 | expression support. |
| 308 | |
| 309 | * Copy `step0_repl.qx` to `step1_read_print.qx`. |
| 310 | |
| 311 | * Add a `reader.qx` file to hold functions related to the reader. |
| 312 | |
| 313 | * If the target language has objects types (OOP), then the next step |
| 314 | is to create a simple stateful Reader object in `reader.qx`. This |
| 315 | object will store the tokens and a position. The Reader object will |
| 316 | have two methods: `next` and `peek`. `next` returns the token at |
| 317 | the current position and increments the position. `peek` just |
| 318 | returns the token at the current position. |
| 319 | |
| 320 | * Add a function `read_str` in `reader.qx`. This function |
| 321 | will call `tokenize` and then create a new Reader object instance |
| 322 | with the tokens. Then it will call `read_form` with the Reader |
| 323 | instance. |
| 324 | |
| 325 | * Add a function `tokenize` in `reader.qx`. This function will take |
| 326 | a single string and return an array/list |
| 327 | of all the tokens (strings) in it. The following regular expression |
| 328 | (PCRE) will match all mal tokens. |
| 329 | ``` |
| 330 | [\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"?|;.*|[^\s\[\]{}('"`,;)]*) |
| 331 | ``` |
| 332 | * For each match captured within the parenthesis starting at char 6 of the |
| 333 | regular expression a new token will be created. |
| 334 | |
| 335 | * `[\s,]*`: Matches any number of whitespaces or commas. This is not captured |
| 336 | so it will be ignored and not tokenized. |
| 337 | |
| 338 | * `~@`: Captures the special two-characters `~@` (tokenized). |
| 339 | |
| 340 | * ```[\[\]{}()'`~^@]```: Captures any special single character, one of |
| 341 | ```[]{}()'`~^@``` (tokenized). |
| 342 | |
| 343 | * `"(?:\\.|[^\\"])*"?`: Starts capturing at a double-quote and stops at the |
| 344 | next double-quote unless it was preceded by a backslash in which case it |
| 345 | includes it until the next double-quote (tokenized). It will also |
| 346 | match unbalanced strings (no ending double-quote) which should be |
| 347 | reported as an error. |
| 348 | |
| 349 | * `;.*`: Captures any sequence of characters starting with `;` (tokenized). |
| 350 | |
| 351 | * ```[^\s\[\]{}('"`,;)]*```: Captures a sequence of zero or more non special |
| 352 | characters (e.g. symbols, numbers, "true", "false", and "nil") and is sort |
| 353 | of the inverse of the one above that captures special characters (tokenized). |
| 354 | |
| 355 | * Add the function `read_form` to `reader.qx`. This function |
| 356 | will peek at the first token in the Reader object and switch on the |
| 357 | first character of that token. If the character is a left paren then |
| 358 | `read_list` is called with the Reader object. Otherwise, `read_atom` |
| 359 | is called with the Reader Object. The return value from `read_form` |
| 360 | is a mal data type. If your target language is statically typed then |
| 361 | you will need some way for `read_form` to return a variant or |
| 362 | subclass type. For example, if your language is object oriented, |
| 363 | then you can define a top level MalType (in `types.qx`) that all |
| 364 | your mal data types inherit from. The MalList type (which also |
| 365 | inherits from MalType) will contain a list/array of other MalTypes. |
| 366 | If your language is dynamically typed then you can likely just |
| 367 | return a plain list/array of other mal types. |
| 368 | |
| 369 | * Add the function `read_list` to `reader.qx`. This function will |
| 370 | repeatedly call `read_form` with the Reader object until it |
| 371 | encounters a ')' token (if it reach EOF before reading a ')' then |
| 372 | that is an error). It accumulates the results into a List type. If |
| 373 | your language does not have a sequential data type that can hold mal |
| 374 | type values you may need to implement one (in `types.qx`). Note |
| 375 | that `read_list` repeatedly calls `read_form` rather than |
| 376 | `read_atom`. This mutually recursive definition between `read_list` |
| 377 | and `read_form` is what allows lists to contain lists. |
| 378 | |
| 379 | * Add the function `read_atom` to `reader.qx`. This function will |
| 380 | look at the contents of the token and return the appropriate scalar |
| 381 | (simple/single) data type value. Initially, you can just implement |
| 382 | numbers (integers) and symbols. This will allow you to proceed |
| 383 | through the next couple of steps before you will need to implement |
| 384 | the other fundamental mal types: nil, true, false, and string. The |
| 385 | remaining scalar mal type, keyword does not |
| 386 | need to be implemented until step A (but can be implemented at any |
| 387 | point between this step and that). BTW, symbols types are just an |
| 388 | object that contains a single string name value (some languages have |
| 389 | symbol types already). |
| 390 | |
| 391 | * Add a file `printer.qx`. This file will contain a single function |
| 392 | `pr_str` which does the opposite of `read_str`: take a mal data |
| 393 | structure and return a string representation of it. But `pr_str` is |
| 394 | much simpler and is basically just a switch statement on the type of |
| 395 | the input object: |
| 396 | |
| 397 | * symbol: return the string name of the symbol |
| 398 | * number: return the number as a string |
| 399 | * list: iterate through each element of the list calling `pr_str` on |
| 400 | it, then join the results with a space separator, and surround the |
| 401 | final result with parens |
| 402 | |
| 403 | * Change the `READ` function in `step1_read_print.qx` to call |
| 404 | `reader.read_str` and the `PRINT` function to call `printer.pr_str`. |
| 405 | `EVAL` continues to simply return its input but the type is now |
| 406 | a mal data type. |
| 407 | |
| 408 | You now have enough hooked up to begin testing your code. You can |
| 409 | manually try some simple inputs: |
| 410 | * `123` -> `123` |
| 411 | * ` 123 ` -> `123` |
| 412 | * `abc` -> `abc` |
| 413 | * ` abc ` -> `abc` |
| 414 | * `(123 456)` -> `(123 456)` |
| 415 | * `( 123 456 789 ) ` -> `(123 456 789)` |
| 416 | * `( + 2 (* 3 4) ) ` -> `(+ 2 (* 3 4))` |
| 417 | |
| 418 | To verify that your code is doing more than just eliminating extra |
| 419 | spaces (and not failing), you can instrument your `reader.qx` functions. |
| 420 | |
| 421 | Once you have gotten past those simple manual tests, it is time to run |
| 422 | the full suite of step 1 tests. Go to the top level and run the |
| 423 | following: |
| 424 | ``` |
| 425 | make "test^quux^step1" |
| 426 | ``` |
| 427 | |
| 428 | Fix any test failures related to symbols, numbers and lists. |
| 429 | |
| 430 | Depending on the functionality of your target language, it is likely |
| 431 | that you have now just completed one of the most difficult steps. It |
| 432 | is down hill from here. The remaining steps will probably be easier |
| 433 | and each step will give progressively more bang for the buck. |
| 434 | |
| 435 | #### Deferrable: |
| 436 | |
| 437 | |
| 438 | * Add support for the other basic data type to your reader and printer |
| 439 | functions: string, nil, true, and false. Nil, true, and false |
| 440 | become mandatory at step 4, strings at step 6. When a string is read, |
| 441 | the following transformations are |
| 442 | applied: a backslash followed by a doublequote is translated into |
| 443 | a plain doublequote character, a backslash followed by "n" is |
| 444 | translated into a newline, and a backslash followed by another |
| 445 | backslash is translated into a single backslash. To properly print |
| 446 | a string (for step 4 string functions), the `pr_str` function needs |
| 447 | another parameter called `print_readably`. When `print_readably` is |
| 448 | true, doublequotes, newlines, and backslashes are translated into |
| 449 | their printed representations (the reverse of the reader). The |
| 450 | `PRINT` function in the main program should call `pr_str` with |
| 451 | print_readably set to true. |
| 452 | |
| 453 | * Add error checking to your reader functions to make sure parens |
| 454 | are properly matched. Catch and print these errors in your main |
| 455 | loop. If your language does not have try/catch style bubble up |
| 456 | exception handling, then you will need to add explicit error |
| 457 | handling to your code to catch and pass on errors without crashing. |
| 458 | |
| 459 | * Add support for reader macros which are forms that are |
| 460 | transformed into other forms during the read phase. Refer to |
| 461 | `tests/step1_read_print.mal` for the form that these macros should |
| 462 | take (they are just simple transformations of the token stream). |
| 463 | |
| 464 | * Add support for the other mal types: keyword, vector, hash-map. |
| 465 | * keyword: a keyword is a token that begins with a colon. A keyword |
| 466 | can just be stored as a string with special unicode prefix like |
| 467 | 0x29E (or char 0xff/127 if the target language does not have good |
| 468 | unicode support) and the printer translates strings with that |
| 469 | prefix back to the keyword representation. This makes it easy to |
| 470 | use keywords as hash map keys in most languages. You can also |
| 471 | store keywords as a unique data type, but you will need to make |
| 472 | sure they can be used as hash map keys (which may involve doing |
| 473 | a similar prefixed translation anyways). |
| 474 | * vector: a vector can be implemented with same underlying |
| 475 | type as a list as long as there is some mechanism to keep track of |
| 476 | the difference. You can use the same reader function for both |
| 477 | lists and vectors by adding parameters for the starting and ending |
| 478 | tokens. |
| 479 | * hash-map: a hash-map is an associative data structure that maps |
| 480 | strings to other mal values. If you implement keywords as prefixed |
| 481 | strings, then you only need a native associative data structure |
| 482 | which supports string keys. Clojure allows any value to be a hash |
| 483 | map key, but the base functionality in mal is to support strings |
| 484 | and keyword keys. Because of the representation of hash-maps as |
| 485 | an alternating sequence of keys and values, you can probably use |
| 486 | the same reader function for hash-maps as lists and vectors with |
| 487 | parameters to indicate the starting and ending tokens. The odd |
| 488 | tokens are then used for keys with the corresponding even tokens |
| 489 | as the values. |
| 490 | |
| 491 | * Add comment support to your reader. The tokenizer should ignore |
| 492 | tokens that start with ";". Your `read_str` function will need to |
| 493 | properly handle when the tokenizer returns no values. The simplest |
| 494 | way to do this is to return `nil` mal value. A cleaner option (that |
| 495 | does not print `nil` at the prompt is to throw a special exception |
| 496 | that causes the main loop to simply continue at the beginning of the |
| 497 | loop without calling `rep`. |
| 498 | |
| 499 | |
| 500 | <a name="step2"></a> |
| 501 | |
| 502 | ### Step 2: Eval |
| 503 | |
| 504 | ![step2_eval architecture](step2_eval.png) |
| 505 | |
| 506 | In step 1 your mal interpreter was basically just a way to validate |
| 507 | input and eliminate extraneous white space. In this step you will turn |
| 508 | your interpreter into a simple number calculator by adding |
| 509 | functionality to the evaluator (`EVAL`). |
| 510 | |
| 511 | Compare the pseudocode for step 1 and step 2 to get a basic idea of |
| 512 | the changes that will be made during this step: |
| 513 | ``` |
| 514 | diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt |
| 515 | ``` |
| 516 | |
| 517 | * Copy `step1_read_print.qx` to `step2_eval.qx`. |
| 518 | |
| 519 | * Define a simple initial REPL environment. This environment is an |
| 520 | associative structure that maps symbols (or symbol names) to |
| 521 | numeric functions. For example, in python this would look something |
| 522 | like this: |
| 523 | ``` |
| 524 | repl_env = {'+': lambda a,b: a+b, |
| 525 | '-': lambda a,b: a-b, |
| 526 | '*': lambda a,b: a*b, |
| 527 | '/': lambda a,b: int(a/b)} |
| 528 | ``` |
| 529 | |
| 530 | * Modify the `rep` function to pass the REPL environment as the second |
| 531 | parameter for the `EVAL` call. |
| 532 | |
| 533 | * Create a new function `eval_ast` which takes `ast` (mal data type) |
| 534 | and an associative structure (the environment from above). |
| 535 | `eval_ast` switches on the type of `ast` as follows: |
| 536 | |
| 537 | * symbol: lookup the symbol in the environment structure and return |
| 538 | the value or raise an error if no value is found |
| 539 | * list: return a new list that is the result of calling `EVAL` on |
| 540 | each of the members of the list |
| 541 | * otherwise just return the original `ast` value |
| 542 | |
| 543 | * Modify `EVAL` to check if the first parameter `ast` is a list. |
| 544 | * `ast` is not a list: then return the result of calling `eval_ast` |
| 545 | on it. |
| 546 | * `ast` is a empty list: return ast unchanged. |
| 547 | * `ast` is a list: call `eval_ast` to get a new evaluated list. Take |
| 548 | the first item of the evaluated list and call it as function using |
| 549 | the rest of the evaluated list as its arguments. |
| 550 | |
| 551 | If your target language does not have full variable length argument |
| 552 | support (e.g. variadic, vararg, splats, apply) then you will need to |
| 553 | pass the full list of arguments as a single parameter and split apart |
| 554 | the individual values inside of every mal function. This is annoying, |
| 555 | but workable. |
| 556 | |
| 557 | The process of taking a list and invoking or executing it to return |
| 558 | something new is known in Lisp as the "apply" phase. |
| 559 | |
| 560 | Try some simple expressions: |
| 561 | |
| 562 | * `(+ 2 3)` -> `5` |
| 563 | * `(+ 2 (* 3 4))` -> `14` |
| 564 | |
| 565 | The most likely challenge you will encounter is how to properly call |
| 566 | a function references using an arguments list. |
| 567 | |
| 568 | Now go to the top level, run the step 2 tests and fix the errors. |
| 569 | ``` |
| 570 | make "test^quux^step2" |
| 571 | ``` |
| 572 | |
| 573 | You now have a simple prefix notation calculator! |
| 574 | |
| 575 | #### Deferrable: |
| 576 | |
| 577 | * `eval_ast` should evaluate elements of vectors and hash-maps. Add the |
| 578 | following cases in `eval_ast`: |
| 579 | * If `ast` is a vector: return a new vector that is the result of calling |
| 580 | `EVAL` on each of the members of the vector. |
| 581 | * If `ast` is a hash-map: return a new hash-map which consists of key-value |
| 582 | pairs where the key is a key from the hash-map and the value is the result |
| 583 | of calling `EVAL` on the corresponding value. |
| 584 | |
| 585 | |
| 586 | <a name="step3"></a> |
| 587 | |
| 588 | ### Step 3: Environments |
| 589 | |
| 590 | ![step3_env architecture](step3_env.png) |
| 591 | |
| 592 | In step 2 you were already introduced to REPL environment (`repl_env`) |
| 593 | where the basic numeric functions were stored and looked up. In this |
| 594 | step you will add the ability to create new environments (`let*`) and |
| 595 | modify existing environments (`def!`). |
| 596 | |
| 597 | A Lisp environment is an associative data structure that maps symbols (the |
| 598 | keys) to values. But Lisp environments have an additional important |
| 599 | function: they can refer to another environment (the outer |
| 600 | environment). During environment lookups, if the current environment |
| 601 | does not have the symbol, the lookup continues in the outer |
| 602 | environment, and continues this way until the symbol is either found, |
| 603 | or the outer environment is `nil` (the outermost environment in the |
| 604 | chain). |
| 605 | |
| 606 | Compare the pseudocode for step 2 and step 3 to get a basic idea of |
| 607 | the changes that will be made during this step: |
| 608 | ``` |
| 609 | diff -urp ../process/step2_eval.txt ../process/step3_env.txt |
| 610 | ``` |
| 611 | |
| 612 | * Copy `step2_eval.qx` to `step3_env.qx`. |
| 613 | |
| 614 | * Create `env.qx` to hold the environment definition. |
| 615 | |
| 616 | * Define an `Env` object that is instantiated with a single `outer` |
| 617 | parameter and starts with an empty associative data structure |
| 618 | property `data`. |
| 619 | |
| 620 | * Define three methods for the Env object: |
| 621 | * set: takes a symbol key and a mal value and adds to the `data` |
| 622 | structure |
| 623 | * find: takes a symbol key and if the current environment contains |
| 624 | that key then return the environment. If no key is found and outer |
| 625 | is not `nil` then call find (recurse) on the outer environment. |
| 626 | * get: takes a symbol key and uses the `find` method to locate the |
| 627 | environment with the key, then returns the matching value. If no |
| 628 | key is found up the outer chain, then throws/raises a "not found" |
| 629 | error. |
| 630 | |
| 631 | * Update `step3_env.qx` to use the new `Env` type to create the |
| 632 | repl_env (with a `nil` outer value) and use the `set` method to add |
| 633 | the numeric functions. |
| 634 | |
| 635 | * Modify `eval_ast` to call the `get` method on the `env` parameter. |
| 636 | |
| 637 | * Modify the apply section of `EVAL` to switch on the first element of |
| 638 | the list: |
| 639 | * symbol "def!": call the set method of the current environment |
| 640 | (second parameter of `EVAL` called `env`) using the unevaluated |
| 641 | first parameter (second list element) as the symbol key and the |
| 642 | evaluated second parameter as the value. |
| 643 | * symbol "let\*": create a new environment using the current |
| 644 | environment as the outer value and then use the first parameter as |
| 645 | a list of new bindings in the "let\*" environment. Take the second |
| 646 | element of the binding list, call `EVAL` using the new "let\*" |
| 647 | environment as the evaluation environment, then call `set` on the |
| 648 | "let\*" environment using the first binding list element as the key |
| 649 | and the evaluated second element as the value. This is repeated |
| 650 | for each odd/even pair in the binding list. Note in particular, |
| 651 | the bindings earlier in the list can be referred to by later |
| 652 | bindings. Finally, the second parameter (third element) of the |
| 653 | original `let*` form is evaluated using the new "let\*" environment |
| 654 | and the result is returned as the result of the `let*` (the new |
| 655 | let environment is discarded upon completion). |
| 656 | * otherwise: call `eval_ast` on the list and apply the first element |
| 657 | to the rest as before. |
| 658 | |
| 659 | `def!` and `let*` are Lisp "specials" (or "special atoms") which means |
| 660 | that they are language level features and more specifically that the |
| 661 | rest of the list elements (arguments) may be evaluated differently (or |
| 662 | not at all) unlike the default apply case where all elements of the |
| 663 | list are evaluated before the first element is invoked. Lists which |
| 664 | contain a "special" as the first element are known as "special forms". |
| 665 | They are special because they follow special evaluation rules. |
| 666 | |
| 667 | Try some simple environment tests: |
| 668 | |
| 669 | * `(def! a 6)` -> `6` |
| 670 | * `a` -> `6` |
| 671 | * `(def! b (+ a 2))` -> `8` |
| 672 | * `(+ a b)` -> `14` |
| 673 | * `(let* (c 2) c)` -> `2` |
| 674 | |
| 675 | Now go to the top level, run the step 3 tests and fix the errors. |
| 676 | ``` |
| 677 | make "test^quux^step3" |
| 678 | ``` |
| 679 | |
| 680 | You mal implementation is still basically just a numeric calculator |
| 681 | with save/restore capability. But you have set the foundation for step |
| 682 | 4 where it will begin to feel like a real programming language. |
| 683 | |
| 684 | |
| 685 | An aside on mutation and typing: |
| 686 | |
| 687 | The "!" suffix on symbols is used to indicate that this symbol refers |
| 688 | to a function that mutates something else. In this case, the `def!` |
| 689 | symbol indicates a special form that will mutate the current |
| 690 | environment. Many (maybe even most) of runtime problems that are |
| 691 | encountered in software engineering are a result of mutation. By |
| 692 | clearly marking code where mutation may occur, you can more easily |
| 693 | track down the likely cause of runtime problems when they do occur. |
| 694 | |
| 695 | Another cause of runtime errors is type errors, where a value of one |
| 696 | type is unexpectedly treated by the program as a different and |
| 697 | incompatible type. Statically typed languages try to make the |
| 698 | programmer solve all type problems before the program is allowed to |
| 699 | run. Most Lisp variants tend to be dynamically typed (types of values |
| 700 | are checked when they are actually used at runtime). |
| 701 | |
| 702 | As an aside-aside: The great debate between static and dynamic typing |
| 703 | can be understood by following the money. Advocates of strict static |
| 704 | typing use words like "correctness" and "safety" and thus get |
| 705 | government and academic funding. Advocates of dynamic typing use words |
| 706 | like "agile" and "time-to-market" and thus get venture capital and |
| 707 | commercial funding. |
| 708 | |
| 709 | |
| 710 | <a name="step4"></a> |
| 711 | |
| 712 | ### Step 4: If Fn Do |
| 713 | |
| 714 | ![step4_if_fn_do architecture](step4_if_fn_do.png) |
| 715 | |
| 716 | In step 3 you added environments and the special forms for |
| 717 | manipulating environments. In this step you will add 3 new special |
| 718 | forms (`if`, `fn*` and `do`) and add several more core functions to |
| 719 | the default REPL environment. Our new architecture will look like |
| 720 | this: |
| 721 | |
| 722 | The `fn*` special form is how new user-defined functions are created. |
| 723 | In some Lisps, this special form is named "lambda". |
| 724 | |
| 725 | Compare the pseudocode for step 3 and step 4 to get a basic idea of |
| 726 | the changes that will be made during this step: |
| 727 | ``` |
| 728 | diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt |
| 729 | ``` |
| 730 | |
| 731 | * Copy `step3_env.qx` to `step4_if_fn_do.qx`. |
| 732 | |
| 733 | * If you have not implemented reader and printer support (and data |
| 734 | types) for `nil`, `true` and `false`, you will need to do so for |
| 735 | this step. |
| 736 | |
| 737 | * Update the constructor/initializer for environments to take two new |
| 738 | arguments: `binds` and `exprs`. Bind (`set`) each element (symbol) |
| 739 | of the binds list to the respective element of the `exprs` list. |
| 740 | |
| 741 | * Add support to `printer.qx` to print functions values. A string |
| 742 | literal like "#\<function>" is sufficient. |
| 743 | |
| 744 | * Add the following special forms to `EVAL`: |
| 745 | |
| 746 | * `do`: Evaluate all the elements of the list using `eval_ast` |
| 747 | and return the final evaluated element. |
| 748 | * `if`: Evaluate the first parameter (second element). If the result |
| 749 | (condition) is anything other than `nil` or `false`, then evaluate |
| 750 | the second parameter (third element of the list) and return the |
| 751 | result. Otherwise, evaluate the third parameter (fourth element) |
| 752 | and return the result. If condition is false and there is no third |
| 753 | parameter, then just return `nil`. |
| 754 | * `fn*`: Return a new function closure. The body of that closure |
| 755 | does the following: |
| 756 | * Create a new environment using `env` (closed over from outer |
| 757 | scope) as the `outer` parameter, the first parameter (second |
| 758 | list element of `ast` from the outer scope) as the `binds` |
| 759 | parameter, and the parameters to the closure as the `exprs` |
| 760 | parameter. |
| 761 | * Call `EVAL` on the second parameter (third list element of `ast` |
| 762 | from outer scope), using the new environment. Use the result as |
| 763 | the return value of the closure. |
| 764 | |
| 765 | If your target language does not support closures, then you will need |
| 766 | to implement `fn*` using some sort of structure or object that stores |
| 767 | the values being closed over: the first and second elements of the |
| 768 | `ast` list (function parameter list and function body) and the current |
| 769 | environment `env`. In this case, your native functions will need to be |
| 770 | wrapped in the same way. You will probably also need a method/function |
| 771 | that invokes your function object/structure for the default case of |
| 772 | the apply section of `EVAL`. |
| 773 | |
| 774 | Try out the basic functionality you have implemented: |
| 775 | |
| 776 | * `(fn* (a) a)` -> `#<function>` |
| 777 | * `( (fn* (a) a) 7)` -> `7` |
| 778 | * `( (fn* (a) (+ a 1)) 10)` -> `11` |
| 779 | * `( (fn* (a b) (+ a b)) 2 3)` -> `5` |
| 780 | |
| 781 | * Add a new file `core.qx` and define an associative data structure |
| 782 | `ns` (namespace) that maps symbols to functions. Move the numeric |
| 783 | function definitions into this structure. |
| 784 | |
| 785 | * Modify `step4_if_fn_do.qx` to iterate through the `core.ns` |
| 786 | structure and add (`set`) each symbol/function mapping to the |
| 787 | REPL environment (`repl_env`). |
| 788 | |
| 789 | * Add the following functions to `core.ns`: |
| 790 | * `prn`: call `pr_str` on the first parameter with `print_readably` |
| 791 | set to true, prints the result to the screen and then return |
| 792 | `nil`. Note that the full version of `prn` is a deferrable below. |
| 793 | * `list`: take the parameters and return them as a list. |
| 794 | * `list?`: return true if the first parameter is a list, false |
| 795 | otherwise. |
| 796 | * `empty?`: treat the first parameter as a list and return true if |
| 797 | the list is empty and false if it contains any elements. |
| 798 | * `count`: treat the first parameter as a list and return the number |
| 799 | of elements that it contains. |
| 800 | * `=`: compare the first two parameters and return true if they are |
| 801 | the same type and contain the same value. In the case of equal |
| 802 | length lists, each element of the list should be compared for |
| 803 | equality and if they are the same return true, otherwise false. |
| 804 | * `<`, `<=`, `>`, and `>=`: treat the first two parameters as |
| 805 | numbers and do the corresponding numeric comparison, returning |
| 806 | either true or false. |
| 807 | |
| 808 | Now go to the top level, run the step 4 tests. There are a lot of |
| 809 | tests in step 4 but all of the non-optional tests that do not involve |
| 810 | strings should be able to pass now. |
| 811 | |
| 812 | ``` |
| 813 | make "test^quux^step4" |
| 814 | ``` |
| 815 | |
| 816 | Your mal implementation is already beginning to look like a real |
| 817 | language. You have flow control, conditionals, user-defined functions |
| 818 | with lexical scope, side-effects (if you implement the string |
| 819 | functions), etc. However, our little interpreter has not quite reached |
| 820 | Lisp-ness yet. The next several steps will take your implementation |
| 821 | from a neat toy to a full featured language. |
| 822 | |
| 823 | #### Deferrable: |
| 824 | |
| 825 | * Implement Clojure-style variadic function parameters. Modify the |
| 826 | constructor/initializer for environments, so that if a "&" symbol is |
| 827 | encountered in the `binds` list, the next symbol in the `binds` list |
| 828 | after the "&" is bound to the rest of the `exprs` list that has not |
| 829 | been bound yet. |
| 830 | |
| 831 | * Define a `not` function using mal itself. In `step4_if_fn_do.qx` |
| 832 | call the `rep` function with this string: |
| 833 | "(def! not (fn* (a) (if a false true)))". |
| 834 | |
| 835 | * Implement the strings functions in `core.qx`. To implement these |
| 836 | functions, you will need to implement the string support in the |
| 837 | reader and printer (deferrable section of step 1). Each of the string |
| 838 | functions takes multiple mal values, prints them (`pr_str`) and |
| 839 | joins them together into a new string. |
| 840 | * `pr-str`: calls `pr_str` on each argument with `print_readably` |
| 841 | set to true, joins the results with " " and returns the new |
| 842 | string. |
| 843 | * `str`: calls `pr_str` on each argument with `print_readably` set |
| 844 | to false, concatenates the results together ("" separator), and |
| 845 | returns the new string. |
| 846 | * `prn`: calls `pr_str` on each argument with `print_readably` set |
| 847 | to true, joins the results with " ", prints the string to the |
| 848 | screen and then returns `nil`. |
| 849 | * `println`: calls `pr_str` on each argument with `print_readably` set |
| 850 | to false, joins the results with " ", prints the string to the |
| 851 | screen and then returns `nil`. |
| 852 | |
| 853 | |
| 854 | <a name="step5"></a> |
| 855 | |
| 856 | ### Step 5: Tail call optimization |
| 857 | |
| 858 | ![step5_tco architecture](step5_tco.png) |
| 859 | |
| 860 | In step 4 you added special forms `do`, `if` and `fn*` and you defined |
| 861 | some core functions. In this step you will add a Lisp feature called |
| 862 | tail call optimization (TCO). Also called "tail recursion" or |
| 863 | sometimes just "tail calls". |
| 864 | |
| 865 | Several of the special forms that you have defined in `EVAL` end up |
| 866 | calling back into `EVAL`. For those forms that call `EVAL` as the last |
| 867 | thing that they do before returning (tail call) you will just loop back |
| 868 | to the beginning of eval rather than calling it again. The advantage |
| 869 | of this approach is that it avoids adding more frames to the call |
| 870 | stack. This is especially important in Lisp languages because they tend |
| 871 | to prefer using recursion instead of iteration for control structures. |
| 872 | (Though some Lisps, such as Common Lisp, have iteration.) However, with |
| 873 | tail call optimization, recursion can be made as stack efficient as |
| 874 | iteration. |
| 875 | |
| 876 | Compare the pseudocode for step 4 and step 5 to get a basic idea of |
| 877 | the changes that will be made during this step: |
| 878 | ``` |
| 879 | diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt |
| 880 | ``` |
| 881 | |
| 882 | * Copy `step4_if_fn_do.qx` to `step5_tco.qx`. |
| 883 | |
| 884 | * Add a loop (e.g. while true) around all code in `EVAL`. |
| 885 | |
| 886 | * Modify each of the following form cases to add tail call recursion |
| 887 | support: |
| 888 | * `let*`: remove the final `EVAL` call on the second `ast` argument |
| 889 | (third list element). Set `env` (i.e. the local variable passed in |
| 890 | as second parameter of `EVAL`) to the new let environment. Set |
| 891 | `ast` (i.e. the local variable passed in as first parameter of |
| 892 | `EVAL`) to be the second `ast` argument. Continue at the beginning |
| 893 | of the loop (no return). |
| 894 | * `do`: change the `eval_ast` call to evaluate all the parameters |
| 895 | except for the last (2nd list element up to but not including |
| 896 | last). Set `ast` to the last element of `ast`. Continue |
| 897 | at the beginning of the loop (`env` stays unchanged). |
| 898 | * `if`: the condition continues to be evaluated, however, rather |
| 899 | than evaluating the true or false branch, `ast` is set to the |
| 900 | unevaluated value of the chosen branch. Continue at the beginning |
| 901 | of the loop (`env` is unchanged). |
| 902 | |
| 903 | * The return value from the `fn*` special form will now become an |
| 904 | object/structure with attributes that allow the default invoke case |
| 905 | of `EVAL` to do TCO on mal functions. Those attributes are: |
| 906 | * `ast`: the second `ast` argument (third list element) representing |
| 907 | the body of the function. |
| 908 | * `params`: the first `ast` argument (second list element) |
| 909 | representing the parameter names of the function. |
| 910 | * `env`: the current value of the `env` parameter of `EVAL`. |
| 911 | * `fn`: the original function value (i.e. what was return by `fn*` |
| 912 | in step 4). Note that this is deferrable until step 9 when it is |
| 913 | required for the `map` and `apply` core functions). You will also |
| 914 | need it in step 6 if you choose to not to defer atoms/`swap!` from |
| 915 | that step. |
| 916 | |
| 917 | * The default "apply"/invoke case of `EVAL` must now be changed to |
| 918 | account for the new object/structure returned by the `fn*` form. |
| 919 | Continue to call `eval_ast` on `ast`. The first element of the |
| 920 | result of `eval_ast` is `f` and the remaining elements are in `args`. |
| 921 | Switch on the type of `f`: |
| 922 | * regular function (not one defined by `fn*`): apply/invoke it as |
| 923 | before (in step 4). |
| 924 | * a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate |
| 925 | a new environment using the `env` and `params` attributes of `f` |
| 926 | as the `outer` and `binds` arguments and `args` as the `exprs` |
| 927 | argument. Set `env` to the new environment. Continue at the |
| 928 | beginning of the loop. |
| 929 | |
| 930 | Run some manual tests from previous steps to make sure you have not |
| 931 | broken anything by adding TCO. |
| 932 | |
| 933 | Now go to the top level, run the step 5 tests. |
| 934 | |
| 935 | ``` |
| 936 | make "test^quux^step5" |
| 937 | ``` |
| 938 | |
| 939 | Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to` |
| 940 | function cannot be tail call optimized because it does something after |
| 941 | the recursive call (`sum-to` calls itself and then does the addition). |
| 942 | Lispers say that the `sum-to` is not in tail position. The `sum2` |
| 943 | function however, calls itself from tail position. In other words, the |
| 944 | recursive call to `sum2` is the last action that `sum2` does. Calling |
| 945 | `sum-to` with a large value will cause a stack overflow exception in |
| 946 | most target languages (some have super-special tricks they use to |
| 947 | avoid stack overflows). |
| 948 | |
| 949 | Congratulations, your mal implementation already has a feature (TCO) |
| 950 | that most mainstream languages lack. |
| 951 | |
| 952 | |
| 953 | <a name="step6"></a> |
| 954 | |
| 955 | ### Step 6: Files, Mutation, and Evil |
| 956 | |
| 957 | ![step6_file architecture](step6_file.png) |
| 958 | |
| 959 | In step 5 you added tail call optimization. In this step you will add |
| 960 | some string and file operations and give your implementation a touch |
| 961 | of evil ... er, eval. And as long as your language supports function |
| 962 | closures, this step will be quite simple. However, to complete this |
| 963 | step, you must implement string type support, so if you have been |
| 964 | holding off on that you will need to go back and do so. |
| 965 | |
| 966 | Compare the pseudocode for step 5 and step 6 to get a basic idea of |
| 967 | the changes that will be made during this step: |
| 968 | ``` |
| 969 | diff -urp ../process/step5_tco.txt ../process/step6_file.txt |
| 970 | ``` |
| 971 | |
| 972 | * Copy `step5_tco.qx` to `step6_file.qx`. |
| 973 | |
| 974 | * Add two new string functions to the core namespaces: |
| 975 | * `read-string`: this function just exposes the `read_str` function |
| 976 | from the reader. If your mal string type is not the same as your |
| 977 | target language (e.g. statically typed language) then your |
| 978 | `read-string` function will need to unbox (extract) the raw string |
| 979 | from the mal string type in order to call `read_str`. |
| 980 | * `slurp`: this function takes a file name (string) and returns the |
| 981 | contents of the file as a string. Once again, if your mal string |
| 982 | type wraps a raw target language string, then you will need to |
| 983 | unmarshall (extract) the string parameter to get the raw file name |
| 984 | string and marshall (wrap) the result back to a mal string type. |
| 985 | |
| 986 | * In your main program, add a new symbol "eval" to your REPL |
| 987 | environment. The value of this new entry is a function that takes |
| 988 | a single argument `ast`. The closure calls the your `EVAL` function |
| 989 | using the `ast` as the first argument and the REPL environment |
| 990 | (closed over from outside) as the second argument. The result of |
| 991 | the `EVAL` call is returned. This simple but powerful addition |
| 992 | allows your program to treat mal data as a mal program. For example, |
| 993 | you can now to this: |
| 994 | ``` |
| 995 | (def! mal-prog (list + 1 2)) |
| 996 | (eval mal-prog) |
| 997 | ``` |
| 998 | |
| 999 | * Define a `load-file` function using mal itself. In your main |
| 1000 | program call the `rep` function with this string: |
| 1001 | "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \"\nnil)\")))))". |
| 1002 | |
| 1003 | Try out `load-file`: |
| 1004 | * `(load-file "../tests/incA.mal")` -> `9` |
| 1005 | * `(inc4 3)` -> `7` |
| 1006 | |
| 1007 | The `load-file` function does the following: |
| 1008 | * Call `slurp` to read in a file by name. Surround the contents with |
| 1009 | "(do ...)" so that the whole file will be treated as a single |
| 1010 | program AST (abstract syntax tree). Add a new line in case the files |
| 1011 | ends with a comment. The `nil` ensures a short and predictable result, |
| 1012 | instead of what happens to be the last function defined in the loaded file. |
| 1013 | * Call `read-string` on the string returned from `slurp`. This uses |
| 1014 | the reader to read/convert the file contents into mal data/AST. |
| 1015 | * Call `eval` (the one in the REPL environment) on the AST returned |
| 1016 | from `read-string` to "run" it. |
| 1017 | |
| 1018 | Besides adding file and eval support, we'll add support for the atom data type |
| 1019 | in this step. An atom is the Mal way to represent *state*; it is |
| 1020 | heavily inspired by [Clojure's atoms](http://clojure.org/state). An atom holds |
| 1021 | a reference to a single Mal value of any type; it supports reading that Mal value |
| 1022 | and *modifying* the reference to point to another Mal value. Note that this is |
| 1023 | the only Mal data type that is mutable (but the Mal values it refers to are |
| 1024 | still immutable; immutability is explained in greater detail in step 7). |
| 1025 | You'll need to add 5 functions to the core namespace to support atoms: |
| 1026 | |
| 1027 | * `atom`: Takes a Mal value and returns a new atom which points to that Mal value. |
| 1028 | * `atom?`: Takes an argument and returns `true` if the argument is an atom. |
| 1029 | * `deref`: Takes an atom argument and returns the Mal value referenced by this atom. |
| 1030 | * `reset!`: Takes an atom and a Mal value; the atom is modified to refer to |
| 1031 | the given Mal value. The Mal value is returned. |
| 1032 | * `swap!`: Takes an atom, a function, and zero or more function arguments. The |
| 1033 | atom's value is modified to the result of applying the function with the atom's |
| 1034 | value as the first argument and the optionally given function arguments as |
| 1035 | the rest of the arguments. The new atom's value is returned. (Side note: Mal is |
| 1036 | single-threaded, but in concurrent languages like Clojure, `swap!` promises |
| 1037 | atomic update: `(swap! myatom (fn* [x] (+ 1 x)))` will always increase the |
| 1038 | `myatom` counter by one and will not suffer from missing updates when the |
| 1039 | atom is updated from multiple threads.) |
| 1040 | |
| 1041 | Optionally, you can add a reader macro `@` which will serve as a short form for |
| 1042 | `deref`, so that `@a` is equivalent to `(deref a)`. In order to do that, modify |
| 1043 | the conditional in reader `read_form` function and add a case which deals with |
| 1044 | the `@` token: if the token is `@` (at sign) then return a new list that |
| 1045 | contains the symbol `deref` and the result of reading the next form |
| 1046 | (`read_form`). |
| 1047 | |
| 1048 | Now go to the top level, run the step 6 tests. The optional tests will |
| 1049 | need support from the reader for comments, vectors, hash-maps and the `@` |
| 1050 | reader macro: |
| 1051 | ``` |
| 1052 | make "test^quux^step6" |
| 1053 | ``` |
| 1054 | |
| 1055 | Congratulations, you now have a full-fledged scripting language that |
| 1056 | can run other mal programs. The `slurp` function loads a file as |
| 1057 | a string, the `read-string` function calls the mal reader to turn that |
| 1058 | string into data, and the `eval` function takes data and evaluates it |
| 1059 | as a normal mal program. However, it is important to note that the |
| 1060 | `eval` function is not just for running external programs. Because mal |
| 1061 | programs are regular mal data structures, you can dynamically generate |
| 1062 | or manipulate those data structures before calling `eval` on them. |
| 1063 | This isomorphism (same shape) between data and programs is known as |
| 1064 | "homoiconicity". Lisp languages are homoiconic and this property |
| 1065 | distinguishes them from most other programming languages. |
| 1066 | |
| 1067 | You mal implementation is quite powerful already but the set of |
| 1068 | functions that are available (from `core.qx`) is fairly limited. The |
| 1069 | bulk of the functions you will add are described in step 9 and step A, |
| 1070 | but you will begin to flesh them out over the next few steps to |
| 1071 | support quoting (step 7) and macros (step 8). |
| 1072 | |
| 1073 | |
| 1074 | #### Deferrable: |
| 1075 | |
| 1076 | * Add the ability to run another mal program from the command line. |
| 1077 | Prior to the REPL loop, check if your mal implementation is called |
| 1078 | with command line arguments. If so, treat the first argument as |
| 1079 | a filename and use `rep` to call `load-file` on that filename, and |
| 1080 | finally exit/terminate execution. |
| 1081 | |
| 1082 | * Add the rest of the command line arguments to your REPL environment |
| 1083 | so that programs that are run with `load-file` have access to their |
| 1084 | calling environment. Add a new "\*ARGV\*" (symbol) entry to your REPL |
| 1085 | environment. The value of this entry should be the rest of the |
| 1086 | command line arguments as a mal list value. |
| 1087 | |
| 1088 | |
| 1089 | <a name="step7"></a> |
| 1090 | |
| 1091 | ### Step 7: Quoting |
| 1092 | |
| 1093 | ![step7_quote architecture](step7_quote.png) |
| 1094 | |
| 1095 | In step 7 you will add the special forms `quote` and `quasiquote` and |
| 1096 | add supporting core functions `cons` and `concat`. The two quote forms |
| 1097 | add a powerful abstraction for manipulating mal code itself |
| 1098 | (meta-programming). |
| 1099 | |
| 1100 | The `quote` special form indicates to the evaluator (`EVAL`) that the |
| 1101 | parameter should not be evaluated (yet). At first glance, this might |
| 1102 | not seem particularly useful but an example of what this enables is the |
| 1103 | ability for a mal program to refer to a symbol itself rather than the |
| 1104 | value that it evaluates to. Likewise with lists. For example, consider |
| 1105 | the following: |
| 1106 | |
| 1107 | * `(prn abc)`: this will lookup the symbol `abc` in the current |
| 1108 | evaluation environment and print it. This will result in error if |
| 1109 | `abc` is not defined. |
| 1110 | * `(prn (quote abc))`: this will print "abc" (prints the symbol |
| 1111 | itself). This will work regardless of whether `abc` is defined in |
| 1112 | the current environment. |
| 1113 | * `(prn (1 2 3))`: this will result in an error because `1` is not |
| 1114 | a function and cannot be applied to the arguments `(2 3)`. |
| 1115 | * `(prn (quote (1 2 3)))`: this will print "(1 2 3)". |
| 1116 | * `(def! l (quote (1 2 3)))`: list quoting allows us to define lists |
| 1117 | directly in the code (list literal). Another way of doing this is |
| 1118 | with the list function: `(def! l (list 1 2 3))`. |
| 1119 | |
| 1120 | The second special quoting form is `quasiquote`. This allows a quoted |
| 1121 | list to have internal elements of the list that are temporarily |
| 1122 | unquoted (normal evaluation). There are two special forms that only |
| 1123 | mean something within a quasiquoted list: `unquote` and |
| 1124 | `splice-unquote`. These are perhaps best explained with some examples: |
| 1125 | |
| 1126 | * `(def! lst (quote (b c)))` -> `(b c)` |
| 1127 | * `(quasiquote (a lst d))` -> `(a lst d)` |
| 1128 | * `(quasiquote (a (unquote lst) d))` -> `(a (b c) d)` |
| 1129 | * `(quasiquote (a (splice-unquote lst) d))` -> `(a b c d)` |
| 1130 | |
| 1131 | The `unquote` form turns evaluation back on for its argument and the |
| 1132 | result of evaluation is put in place into the quasiquoted list. The |
| 1133 | `splice-unquote` also turns evaluation back on for its argument, but |
| 1134 | the evaluated value must be a list which is then "spliced" into the |
| 1135 | quasiquoted list. The true power of the quasiquote form will be |
| 1136 | manifest when it is used together with macros (in the next step). |
| 1137 | |
| 1138 | Compare the pseudocode for step 6 and step 7 to get a basic idea of |
| 1139 | the changes that will be made during this step: |
| 1140 | ``` |
| 1141 | diff -urp ../process/step6_file.txt ../process/step7_quote.txt |
| 1142 | ``` |
| 1143 | |
| 1144 | * Copy `step6_file.qx` to `step7_quote.qx`. |
| 1145 | |
| 1146 | * Before implementing the quoting forms, you will need to implement |
| 1147 | some supporting functions in the core namespace: |
| 1148 | * `cons`: this function takes a list as its second |
| 1149 | parameter and returns a new list that has the first argument |
| 1150 | prepended to it. |
| 1151 | * `concat`: this functions takes 0 or more lists as |
| 1152 | parameters and returns a new list that is a concatenation of all |
| 1153 | the list parameters. |
| 1154 | |
| 1155 | An aside on immutability: note that neither cons or concat mutate |
| 1156 | their original list arguments. Any references to them (i.e. other |
| 1157 | lists that they may be "contained" in) will still refer to the |
| 1158 | original unchanged value. Mal, like Clojure, is a language which uses |
| 1159 | immutable data structures. I encourage you to read about the power and |
| 1160 | importance of immutability as implemented in Clojure (from which |
| 1161 | Mal borrows most of its syntax and feature-set). |
| 1162 | |
| 1163 | * Add the `quote` special form. This form just returns its argument |
| 1164 | (the second list element of `ast`). |
| 1165 | |
| 1166 | * Add the `quasiquote` function. |
| 1167 | The `quasiquote` function takes a parameter `ast` and has the |
| 1168 | following conditional. |
| 1169 | - If `ast` is a list starting with the "unquote" symbol, return its |
| 1170 | second element. |
| 1171 | - If `ast` is a list failing previous test, the result will be a |
| 1172 | list populated by the following process. |
| 1173 | |
| 1174 | The result is initially an empty list. |
| 1175 | Iterate over each element `elt` of `ast` in reverse order: |
| 1176 | - If `elt` is a list starting with the "splice-unquote" symbol, |
| 1177 | replace the current result with a list containing: |
| 1178 | the "concat" symbol, |
| 1179 | the second element of `elt`, |
| 1180 | then the previous result. |
| 1181 | - Else replace the current result with a list containing: |
| 1182 | the "cons" symbol, |
| 1183 | the result of calling `quasiquote` with `elt` as argument, |
| 1184 | then the previous result. |
| 1185 | |
| 1186 | This process can also be described recursively: |
| 1187 | - If `ast` is empty return it unchanged. else let `elt` be its |
| 1188 | first element. |
| 1189 | - If `elt` is a list starting with the "splice-unquote" symbol, |
| 1190 | return a list containing: |
| 1191 | the "concat" symbol, |
| 1192 | the second element of `elt`, |
| 1193 | then the result of processing the rest of `ast`. |
| 1194 | - Else return a list containing: |
| 1195 | the "cons" symbol, |
| 1196 | the result of calling `quasiquote` with `elt` as argument, |
| 1197 | then the result of processing the rest of `ast`. |
| 1198 | - If `ast` is a map or a symbol, return a list containing: |
| 1199 | the "quote" symbol, |
| 1200 | then `ast`. |
| 1201 | - Else return `ast` unchanged. |
| 1202 | Such forms are not affected by evaluation, so you may quote them |
| 1203 | as in the previous case if implementation is easyer. |
| 1204 | |
| 1205 | * Optionally, add a the `quasiquoteexpand` special form. |
| 1206 | This form calls the `quasiquote` function using the first `ast` |
| 1207 | argument (second list element) and returns the result. |
| 1208 | It has no other practical purpose than testing your implementation |
| 1209 | of the `quasiquote` internal function. |
| 1210 | |
| 1211 | * Add the `quasiquote` special form. |
| 1212 | This form does the same than `quasiquoteexpand`, |
| 1213 | but evaluates the result in the current environment before returning it, |
| 1214 | either by recursively calling `EVAL` with the result and `env`, |
| 1215 | or by assigning `ast` with the result and continuing execution at |
| 1216 | the top of the loop (TCO). |
| 1217 | |
| 1218 | Now go to the top level, run the step 7 tests: |
| 1219 | ``` |
| 1220 | make "test^quux^step7" |
| 1221 | ``` |
| 1222 | |
| 1223 | Quoting is one of the more mundane functions available in mal, but do |
| 1224 | not let that discourage you. Your mal implementation is almost |
| 1225 | complete, and quoting sets the stage for the next very exciting step: |
| 1226 | macros. |
| 1227 | |
| 1228 | |
| 1229 | #### Deferrable |
| 1230 | |
| 1231 | * The full names for the quoting forms are fairly verbose. Most Lisp |
| 1232 | languages have a short-hand syntax and Mal is no exception. These |
| 1233 | short-hand syntaxes are known as reader macros because they allow us |
| 1234 | to manipulate mal code during the reader phase. Macros that run |
| 1235 | during the eval phase are just called "macros" and are described in |
| 1236 | the next section. Expand the conditional with reader `read_form` |
| 1237 | function to add the following four cases: |
| 1238 | * token is "'" (single quote): return a new list that contains the |
| 1239 | symbol "quote" and the result of reading the next form |
| 1240 | (`read_form`). |
| 1241 | * token is "\`" (back-tick): return a new list that contains the |
| 1242 | symbol "quasiquote" and the result of reading the next form |
| 1243 | (`read_form`). |
| 1244 | * token is "~" (tilde): return a new list that contains the |
| 1245 | symbol "unquote" and the result of reading the next form |
| 1246 | (`read_form`). |
| 1247 | * token is "~@" (tilde + at sign): return a new list that contains |
| 1248 | the symbol "splice-unquote" and the result of reading the next |
| 1249 | form (`read_form`). |
| 1250 | |
| 1251 | * Add support for quoting of vectors. `cons` |
| 1252 | should also accept a vector as the second argument. The return value |
| 1253 | is a list regardless. `concat` should support concatenation of |
| 1254 | lists, vectors, or a mix of both. The result is always a list. |
| 1255 | |
| 1256 | Implement a core function `vec` turning a list into a vector with |
| 1257 | the same elements. If provided a vector, `vec` should return it |
| 1258 | unchanged. |
| 1259 | |
| 1260 | In the `quasiquote` function, when `ast` is a vector, |
| 1261 | return a list containing: |
| 1262 | the "vec" symbol, |
| 1263 | then the result of processing `ast` as if it were a list not |
| 1264 | starting with `quote`. |
| 1265 | |
| 1266 | <a name="step8"></a> |
| 1267 | |
| 1268 | ### Step 8: Macros |
| 1269 | |
| 1270 | ![step8_macros architecture](step8_macros.png) |
| 1271 | |
| 1272 | Your mal implementation is now ready for one of the most lispy and |
| 1273 | exciting of all programming concepts: macros. In the previous step, |
| 1274 | quoting enabled some simple manipulation data structures and therefore |
| 1275 | manipulation of mal code (because the `eval` function from step |
| 1276 | 6 turns mal data into code). In this step you will be able to mark mal |
| 1277 | functions as macros which can manipulate mal code before it is |
| 1278 | evaluated. In other words, macros are user-defined special forms. Or |
| 1279 | to look at it another way, macros allow mal programs to redefine |
| 1280 | the mal language itself. |
| 1281 | |
| 1282 | Compare the pseudocode for step 7 and step 8 to get a basic idea of |
| 1283 | the changes that will be made during this step: |
| 1284 | ``` |
| 1285 | diff -urp ../process/step7_quote.txt ../process/step8_macros.txt |
| 1286 | ``` |
| 1287 | |
| 1288 | * Copy `step7_quote.qx` to `step8_macros.qx`. |
| 1289 | |
| 1290 | |
| 1291 | You might think that the infinite power of macros would require some |
| 1292 | sort of complex mechanism, but the implementation is actually fairly |
| 1293 | simple. |
| 1294 | |
| 1295 | * Add a new attribute `is_macro` to mal function types. This should |
| 1296 | default to false. |
| 1297 | |
| 1298 | * Add a new special form `defmacro!`. This is very similar to the |
| 1299 | `def!` form, but before the evaluated value (mal function) is set in |
| 1300 | the environment, the `is_macro` attribute should be set to true. |
| 1301 | |
| 1302 | * Add a `is_macro_call` function: This function takes arguments `ast` |
| 1303 | and `env`. It returns true if `ast` is a list that contains a symbol |
| 1304 | as the first element and that symbol refers to a function in the |
| 1305 | `env` environment and that function has the `is_macro` attribute set |
| 1306 | to true. Otherwise, it returns false. |
| 1307 | |
| 1308 | * Add a `macroexpand` function: This function takes arguments `ast` |
| 1309 | and `env`. It calls `is_macro_call` with `ast` and `env` and loops |
| 1310 | while that condition is true. Inside the loop, the first element of |
| 1311 | the `ast` list (a symbol), is looked up in the environment to get |
| 1312 | the macro function. This macro function is then called/applied with |
| 1313 | the rest of the `ast` elements (2nd through the last) as arguments. |
| 1314 | The return value of the macro call becomes the new value of `ast`. |
| 1315 | When the loop completes because `ast` no longer represents a macro |
| 1316 | call, the current value of `ast` is returned. |
| 1317 | |
| 1318 | * In the evaluator (`EVAL`) before the special forms switch (apply |
| 1319 | section), perform macro expansion by calling the `macroexpand` |
| 1320 | function with the current value of `ast` and `env`. Set `ast` to the |
| 1321 | result of that call. If the new value of `ast` is no longer a list |
| 1322 | after macro expansion, then return the result of calling `eval_ast` |
| 1323 | on it, otherwise continue with the rest of the apply section |
| 1324 | (special forms switch). |
| 1325 | |
| 1326 | * Add a new special form condition for `macroexpand`. Call the |
| 1327 | `macroexpand` function using the first `ast` argument (second list |
| 1328 | element) and `env`. Return the result. This special form allows |
| 1329 | a mal program to do explicit macro expansion without applying the |
| 1330 | result (which can be useful for debugging macro expansion). |
| 1331 | |
| 1332 | Now go to the top level, run the step 8 tests: |
| 1333 | ``` |
| 1334 | make "test^quux^step8" |
| 1335 | ``` |
| 1336 | |
| 1337 | There is a reasonably good chance that the macro tests will not pass |
| 1338 | the first time. Although the implementation of macros is fairly |
| 1339 | simple, debugging runtime bugs with macros can be fairly tricky. If |
| 1340 | you do run into subtle problems that are difficult to solve, let me |
| 1341 | recommend a couple of approaches: |
| 1342 | |
| 1343 | * Use the macroexpand special form to eliminate one of the layers of |
| 1344 | indirection (to expand but skip evaluate). This will often reveal |
| 1345 | the source of the issue. |
| 1346 | * Add a debug print statement to the top of your main `eval` function |
| 1347 | (inside the TCO loop) to print the current value of `ast` (hint use |
| 1348 | `pr_str` to get easier to debug output). Pull up the step8 |
| 1349 | implementation from another language and uncomment its `eval` |
| 1350 | function (yes, I give you permission to violate the rule this once). |
| 1351 | Run the two side-by-side. The first difference is likely to point to |
| 1352 | the bug. |
| 1353 | |
| 1354 | Congratulations! You now have a Lisp interpreter with a super power |
| 1355 | that most non-Lisp languages can only dream of (I have it on good |
| 1356 | authority that languages dream when you are not using them). If you |
| 1357 | are not already familiar with Lisp macros, I suggest the following |
| 1358 | exercise: write a recursive macro that handles postfixed mal code |
| 1359 | (with the function as the last parameter instead of the first). Or |
| 1360 | not. I have not actually done so myself, but I have heard it is an |
| 1361 | interesting exercise. |
| 1362 | |
| 1363 | In the next step you will add try/catch style exception handling to |
| 1364 | your implementation in addition to some new core functions. After |
| 1365 | step9 you will be very close to having a fully self-hosting mal |
| 1366 | implementation. Let us continue! |
| 1367 | |
| 1368 | |
| 1369 | #### Deferrable |
| 1370 | |
| 1371 | * Add the following new core functions which are frequently used in |
| 1372 | macro functions: |
| 1373 | * `nth`: this function takes a list (or vector) and a number (index) |
| 1374 | as arguments, returns the element of the list at the given index. |
| 1375 | If the index is out of range, this function raises an exception. |
| 1376 | * `first`: this function takes a list (or vector) as its argument |
| 1377 | and return the first element. If the list (or vector) is empty or |
| 1378 | is `nil` then `nil` is returned. |
| 1379 | * `rest`: this function takes a list (or vector) as its argument and |
| 1380 | returns a new list containing all the elements except the first. If |
| 1381 | the list (or vector) is empty or is `nil` then `()` (empty list) |
| 1382 | is returned. |
| 1383 | |
| 1384 | * In the main program, call the `rep` function with the following |
| 1385 | string argument to define a new control structure. |
| 1386 | ``` |
| 1387 | "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))" |
| 1388 | ``` |
| 1389 | * Note that `cond` calls the `throw` function when `cond` is |
| 1390 | called with an odd number of args. The `throw` function is |
| 1391 | implemented in the next step, but it will still serve it's |
| 1392 | purpose here by causing an undefined symbol error. |
| 1393 | |
| 1394 | |
| 1395 | <a name="step9"></a> |
| 1396 | |
| 1397 | ### Step 9: Try |
| 1398 | |
| 1399 | ![step9_try architecture](step9_try.png) |
| 1400 | |
| 1401 | In this step you will implement the final mal special form for |
| 1402 | error/exception handling: `try*/catch*`. You will also add several core |
| 1403 | functions to your implementation. In particular, you will enhance the |
| 1404 | functional programming pedigree of your implementation by adding the |
| 1405 | `apply` and `map` core functions. |
| 1406 | |
| 1407 | Compare the pseudocode for step 8 and step 9 to get a basic idea of |
| 1408 | the changes that will be made during this step: |
| 1409 | ``` |
| 1410 | diff -urp ../process/step8_macros.txt ../process/step9_try.txt |
| 1411 | ``` |
| 1412 | |
| 1413 | * Copy `step8_macros.qx` to `step9_try.qx`. |
| 1414 | |
| 1415 | * Add the `try*/catch*` special form to the EVAL function. The |
| 1416 | try catch form looks like this: `(try* A (catch* B C))`. The form |
| 1417 | `A` is evaluated, if it throws an exception, then form `C` is |
| 1418 | evaluated with a new environment that binds the symbol `B` to the |
| 1419 | value of the exception that was thrown. |
| 1420 | * If your target language has built-in try/catch style exception |
| 1421 | handling then you are already 90% of the way done. Add a |
| 1422 | (native language) try/catch block that evaluates `A` within |
| 1423 | the try block and catches all exceptions. If an exception is |
| 1424 | caught, then translate it to a mal type/value. For native |
| 1425 | exceptions this is either the message string or a mal hash-map |
| 1426 | that contains the message string and other attributes of the |
| 1427 | exception. When a regular mal type/value is used as an |
| 1428 | exception, you will probably need to store it within a native |
| 1429 | exception type in order to be able to convey/transport it using |
| 1430 | the native try/catch mechanism. Then you will extract the mal |
| 1431 | type/value from the native exception. Create a new mal environment |
| 1432 | that binds `B` to the value of the exception. Finally, evaluate `C` |
| 1433 | using that new environment. |
| 1434 | * If your target language does not have built-in try/catch style |
| 1435 | exception handling then you have some extra work to do. One of the |
| 1436 | most straightforward approaches is to create a a global error |
| 1437 | variable that stores the thrown mal type/value. The complication |
| 1438 | is that there are a bunch of places where you must check to see if |
| 1439 | the global error state is set and return without proceeding. The |
| 1440 | rule of thumb is that this check should happen at the top of your |
| 1441 | EVAL function and also right after any call to EVAL (and after any |
| 1442 | function call that might happen to call EVAL further down the |
| 1443 | chain). Yes, it is ugly, but you were warned in the section on |
| 1444 | picking a language. |
| 1445 | |
| 1446 | * Add the `throw` core function. |
| 1447 | * If your language supports try/catch style exception handling, then |
| 1448 | this function takes a mal type/value and throws/raises it as an |
| 1449 | exception. In order to do this, you may need to create a custom |
| 1450 | exception object that wraps a mal value/type. |
| 1451 | * If your language does not support try/catch style exception |
| 1452 | handling, then set the global error state to the mal type/value. |
| 1453 | |
| 1454 | * Add the `apply` and `map` core functions. In step 5, if you did not |
| 1455 | add the original function (`fn`) to the structure returned from |
| 1456 | `fn*`, the you will need to do so now. |
| 1457 | * `apply`: takes at least two arguments. The first argument is |
| 1458 | a function and the last argument is list (or vector). The |
| 1459 | arguments between the function and the last argument (if there are |
| 1460 | any) are concatenated with the final argument to create the |
| 1461 | arguments that are used to call the function. The apply |
| 1462 | function allows a function to be called with arguments that are |
| 1463 | contained in a list (or vector). In other words, `(apply F A B [C |
| 1464 | D])` is equivalent to `(F A B C D)`. |
| 1465 | * `map`: takes a function and a list (or vector) and evaluates the |
| 1466 | function against every element of the list (or vector) one at |
| 1467 | a time and returns the results as a list. |
| 1468 | |
| 1469 | * Add some type predicates core functions. In Lisp, predicates are |
| 1470 | functions that return true/false (or true value/nil) and typically |
| 1471 | end in "?" or "p". |
| 1472 | * `nil?`: takes a single argument and returns true (mal true value) |
| 1473 | if the argument is nil (mal nil value). |
| 1474 | * `true?`: takes a single argument and returns true (mal true value) |
| 1475 | if the argument is a true value (mal true value). |
| 1476 | * `false?`: takes a single argument and returns true (mal true |
| 1477 | value) if the argument is a false value (mal false value). |
| 1478 | * `symbol?`: takes a single argument and returns true (mal true |
| 1479 | value) if the argument is a symbol (mal symbol value). |
| 1480 | |
| 1481 | Now go to the top level, run the step 9 tests: |
| 1482 | ``` |
| 1483 | make "test^quux^step9" |
| 1484 | ``` |
| 1485 | |
| 1486 | Your mal implementation is now essentially a fully featured Lisp |
| 1487 | interpreter. But if you stop now you will miss one of the most |
| 1488 | satisfying and enlightening aspects of creating a mal implementation: |
| 1489 | self-hosting. |
| 1490 | |
| 1491 | #### Deferrable |
| 1492 | |
| 1493 | * Add the following new core functions: |
| 1494 | * `symbol`: takes a string and returns a new symbol with the string |
| 1495 | as its name. |
| 1496 | * `keyword`: takes a string and returns a keyword with the same name |
| 1497 | (usually just be prepending the special keyword |
| 1498 | unicode symbol). This function should also detect if the argument |
| 1499 | is already a keyword and just return it. |
| 1500 | * `keyword?`: takes a single argument and returns true (mal true |
| 1501 | value) if the argument is a keyword, otherwise returns false (mal |
| 1502 | false value). |
| 1503 | * `vector`: takes a variable number of arguments and returns |
| 1504 | a vector containing those arguments. |
| 1505 | * `vector?`: takes a single argument and returns true (mal true |
| 1506 | value) if the argument is a vector, otherwise returns false (mal |
| 1507 | false value). |
| 1508 | * `sequential?`: takes a single argument and returns true (mal true |
| 1509 | value) if it is a list or a vector, otherwise returns false (mal |
| 1510 | false value). |
| 1511 | * `hash-map`: takes a variable but even number of arguments and |
| 1512 | returns a new mal hash-map value with keys from the odd arguments |
| 1513 | and values from the even arguments respectively. This is basically |
| 1514 | the functional form of the `{}` reader literal syntax. |
| 1515 | * `map?`: takes a single argument and returns true (mal true |
| 1516 | value) if the argument is a hash-map, otherwise returns false (mal |
| 1517 | false value). |
| 1518 | * `assoc`: takes a hash-map as the first argument and the remaining |
| 1519 | arguments are odd/even key/value pairs to "associate" (merge) into |
| 1520 | the hash-map. Note that the original hash-map is unchanged |
| 1521 | (remember, mal values are immutable), and a new hash-map |
| 1522 | containing the old hash-maps key/values plus the merged key/value |
| 1523 | arguments is returned. |
| 1524 | * `dissoc`: takes a hash-map and a list of keys to remove from the |
| 1525 | hash-map. Again, note that the original hash-map is unchanged and |
| 1526 | a new hash-map with the keys removed is returned. Key arguments |
| 1527 | that do not exist in the hash-map are ignored. |
| 1528 | * `get`: takes a hash-map and a key and returns the value of looking |
| 1529 | up that key in the hash-map. If the key is not found in the |
| 1530 | hash-map then nil is returned. |
| 1531 | * `contains?`: takes a hash-map and a key and returns true (mal true |
| 1532 | value) if the key exists in the hash-map and false (mal false |
| 1533 | value) otherwise. |
| 1534 | * `keys`: takes a hash-map and returns a list (mal list value) of |
| 1535 | all the keys in the hash-map. |
| 1536 | * `vals`: takes a hash-map and returns a list (mal list value) of |
| 1537 | all the values in the hash-map. |
| 1538 | |
| 1539 | |
| 1540 | <a name="stepA"></a> |
| 1541 | |
| 1542 | ### Step A: Metadata, Self-hosting and Interop |
| 1543 | |
| 1544 | ![stepA_mal architecture](stepA_mal.png) |
| 1545 | |
| 1546 | You have reached the final step of your mal implementation. This step |
| 1547 | is kind of a catchall for things that did not fit into other steps. |
| 1548 | But most importantly, the changes you make in this step will unlock |
| 1549 | the magical power known as "self-hosting". You might have noticed |
| 1550 | that one of the languages that mal is implemented in is "mal". Any mal |
| 1551 | implementation that is complete enough can run the mal implementation |
| 1552 | of mal. You might need to pull out your hammock and ponder this for |
| 1553 | a while if you have never built a compiler or interpreter before. Look |
| 1554 | at the step source files for the mal implementation of mal (it is not |
| 1555 | cheating now that you have reached step A). |
| 1556 | |
| 1557 | If you deferred the implementation of keywords, vectors and hash-maps, |
| 1558 | now is the time to go back and implement them if you want your |
| 1559 | implementation to self-host. |
| 1560 | |
| 1561 | Compare the pseudocode for step 9 and step A to get a basic idea of |
| 1562 | the changes that will be made during this step: |
| 1563 | ``` |
| 1564 | diff -urp ../process/step9_try.txt ../process/stepA_mal.txt |
| 1565 | ``` |
| 1566 | |
| 1567 | * Copy `step9_try.qx` to `stepA_mal.qx`. |
| 1568 | |
| 1569 | * Add the `readline` core function. This functions takes a |
| 1570 | string that is used to prompt the user for input. The line of text |
| 1571 | entered by the user is returned as a string. If the user sends an |
| 1572 | end-of-file (usually Ctrl-D), then nil is returned. |
| 1573 | |
| 1574 | * Add a new "\*host-language\*" (symbol) entry to your REPL |
| 1575 | environment. The value of this entry should be a mal string |
| 1576 | containing the name of the current implementation. |
| 1577 | |
| 1578 | * When the REPL starts up (as opposed to when it is called with |
| 1579 | a script and/or arguments), call the `rep` function with this string |
| 1580 | to print a startup header: |
| 1581 | "(println (str \"Mal [\" \*host-language\* \"]\"))". |
| 1582 | |
| 1583 | * Ensure that the REPL environment contains definitions for `time-ms`, |
| 1584 | `meta`, `with-meta`, `fn?` |
| 1585 | `string?`, `number?`, `seq`, and `conj`. It doesn't really matter |
| 1586 | what they do at this stage: they just need to be defined. Making |
| 1587 | them functions that raise a "not implemented" exception would be |
| 1588 | fine. |
| 1589 | |
| 1590 | Now go to the top level, run the step A tests: |
| 1591 | ``` |
| 1592 | make "test^quux^stepA" |
| 1593 | ``` |
| 1594 | |
| 1595 | Once you have passed all the non-optional step A tests, it is time to |
| 1596 | try self-hosting. Run your step A implementation as normal, but use |
| 1597 | the file argument mode you added in step 6 to run a each of the step |
| 1598 | from the mal implementation: |
| 1599 | ``` |
| 1600 | ./stepA_mal.qx ../mal/step1_read_print.mal |
| 1601 | ./stepA_mal.qx ../mal/step2_eval.mal |
| 1602 | ... |
| 1603 | ./stepA_mal.qx ../mal/step9_try.mal |
| 1604 | ./stepA_mal.qx ../mal/stepA_mal.mal |
| 1605 | ``` |
| 1606 | |
| 1607 | There is a very good chance that you will encounter an error at some |
| 1608 | point while trying to run the mal in mal implementation steps above. |
| 1609 | Debugging failures that happen while self-hosting is MUCH more |
| 1610 | difficult and mind bending. One of the best approaches I have |
| 1611 | personally found is to add prn statements to the mal implementation |
| 1612 | step (not your own implementation of mal) that is causing problems. |
| 1613 | |
| 1614 | Another approach I have frequently used is to pull out the code from |
| 1615 | the mal implementation that is causing the problem and simplify it |
| 1616 | step by step until you have a simple piece of mal code that still |
| 1617 | reproduces the problem. Once the reproducer is simple enough you will |
| 1618 | probably know where in your own implementation that problem is likely |
| 1619 | to be. Please add your simple reproducer as a test case so that future |
| 1620 | implementers will fix similar issues in their code before they get to |
| 1621 | self-hosting when it is much more difficult to track down and fix. |
| 1622 | |
| 1623 | Once you can manually run all the self-hosted steps, it is time to run |
| 1624 | all the tests in self-hosted mode: |
| 1625 | ``` |
| 1626 | make MAL_IMPL=quux "test^mal" |
| 1627 | ``` |
| 1628 | |
| 1629 | When you run into problems (which you almost certainly will), use the |
| 1630 | same process described above to debug them. |
| 1631 | |
| 1632 | Congratulations!!! When all the tests pass, you should pause for |
| 1633 | a moment and consider what you have accomplished. You have implemented |
| 1634 | a Lisp interpreter that is powerful and complete enough to run a large |
| 1635 | mal program which is itself an implementation of the mal language. You |
| 1636 | might even be asking if you can continue the "inception" by using your |
| 1637 | implementation to run a mal implementation which itself runs the mal |
| 1638 | implementation. |
| 1639 | |
| 1640 | |
| 1641 | #### Optional additions |
| 1642 | |
| 1643 | * Add meta-data support to composite data types (lists, vectors |
| 1644 | and hash-maps), and to functions (native or not), by adding a new |
| 1645 | metadata attribute that refers to another mal value/type |
| 1646 | (nil by default). Add the following metadata related core functions |
| 1647 | (and remove any stub versions): |
| 1648 | * `meta`: this takes a single mal function argument and returns the |
| 1649 | value of the metadata attribute. |
| 1650 | * `with-meta`: this function takes two arguments. The first argument |
| 1651 | is a mal function and the second argument is another mal |
| 1652 | value/type to set as metadata. A copy of the mal function is |
| 1653 | returned that has its `meta` attribute set to the second argument. |
| 1654 | Note that it is important that the environment and macro attribute |
| 1655 | of mal function are retained when it is copied. |
| 1656 | * Add a reader-macro that expands the token "^" to |
| 1657 | return a new list that contains the symbol "with-meta" and the |
| 1658 | result of reading the next next form (2nd argument) (`read_form`) and the |
| 1659 | next form (1st argument) in that order |
| 1660 | (metadata comes first with the ^ macro and the function second). |
| 1661 | * If you implemented as `defmacro!` to mutate an existing function |
| 1662 | without copying it, you can now use the function copying mechanism |
| 1663 | used for metadata to make functions immutable even in the |
| 1664 | defmacro! case... |
| 1665 | |
| 1666 | * Add the following new core functions (and remove any stub versions): |
| 1667 | * `time-ms`: takes no arguments and returns the number of |
| 1668 | milliseconds since epoch (00:00:00 UTC January 1, 1970), or, if |
| 1669 | not possible, since another point in time (`time-ms` is usually |
| 1670 | used relatively to measure time durations). After `time-ms` is |
| 1671 | implemented, you can run the performance micro-benchmarks by |
| 1672 | running `make perf^quux`. |
| 1673 | * `conj`: takes a collection and one or more elements as arguments |
| 1674 | and returns a new collection which includes the original |
| 1675 | collection and the new elements. If the collection is a list, a |
| 1676 | new list is returned with the elements inserted at the start of |
| 1677 | the given list in opposite order; if the collection is a vector, a |
| 1678 | new vector is returned with the elements added to the end of the |
| 1679 | given vector. |
| 1680 | * `string?`: returns true if the parameter is a string. |
| 1681 | * `number?`: returns true if the parameter is a number. |
| 1682 | * `fn?`: returns true if the parameter is a function (internal or |
| 1683 | user-defined). |
| 1684 | * `macro?`: returns true if the parameter is a macro. |
| 1685 | * `seq`: takes a list, vector, string, or nil. If an empty list, |
| 1686 | empty vector, or empty string ("") is passed in then nil is |
| 1687 | returned. Otherwise, a list is returned unchanged, a vector is |
| 1688 | converted into a list, and a string is converted to a list that |
| 1689 | containing the original string split into single character |
| 1690 | strings. |
| 1691 | * For interop with the target language, add this core function: |
| 1692 | * `quux-eval`: takes a string, evaluates it in the target language, |
| 1693 | and returns the result converted to the relevant Mal type. You |
| 1694 | may also add other interop functions as you see fit; Clojure, for |
| 1695 | example, has a function called `.` which allows calling Java |
| 1696 | methods. If the target language is a static language, consider |
| 1697 | using FFI or some language-specific reflection mechanism, if |
| 1698 | available. The tests for `quux-eval` and any other interop |
| 1699 | function should be added in `quux/tests/stepA_mal.mal` (see the |
| 1700 | [tests for `lua-eval`](../lua/tests/stepA_mal.mal) as an example). |
| 1701 | |
| 1702 | ### Next Steps |
| 1703 | |
| 1704 | * Join the #mal IRC channel. It's fairly quiet but there are bursts of |
| 1705 | interesting conversation related to mal, Lisps, esoteric programming |
| 1706 | languages, etc. |
| 1707 | * If you have created an implementation for a new target language (or |
| 1708 | a unique and interesting variant of an existing implementation), |
| 1709 | consider sending a pull request to add it into the main mal |
| 1710 | repository. The [FAQ](../docs/FAQ.md#will-you-add-my-new-implementation) |
| 1711 | describes general requirements for getting an implementation merged |
| 1712 | into the main repository. |
| 1713 | * Take your interpreter implementation and have it emit source code in |
| 1714 | the target language rather than immediately evaluating it. In other |
| 1715 | words, create a compiler. |
| 1716 | * Pick a new target language and implement mal in it. Pick a language |
| 1717 | that is very different from any that you already know. |
| 1718 | * Use your mal implementation to implement a real world project. Many |
| 1719 | of these will force you to address interop. Some ideas: |
| 1720 | * Web server (with mal as CGI language for extra points) |
| 1721 | * An IRC/Slack chat bot |
| 1722 | * An editor (GUI or curses) with mal as a scripting/extension |
| 1723 | language. |
| 1724 | * An AI player for a game like Chess or Go. |
| 1725 | * Implement a feature in your mal implementation that is not covered |
| 1726 | by this guide. Some ideas: |
| 1727 | * Namespaces |
| 1728 | * Multi-threading support |
| 1729 | * Errors with line numbers and/or stack traces. |
| 1730 | * Lazy sequences |
| 1731 | * Clojure-style protocols |
| 1732 | * Full call/cc (call-with-current-continuation) support |
| 1733 | * Explicit TCO (i.e. `recur`) with tail-position error checking |