Commit | Line | Data |
---|---|---|
0f4ca9d1 JM |
1 | # The Make-A-Lisp Process |
2 | ||
3 | So you want to write a Lisp interpreter? Welcome! | |
4 | ||
5 | The goal of the Make-A-Lisp project is to make it easy to write your | |
6 | own Lisp interpreter without sacrificing those many "Aha!" moments | |
7 | that come from ascending the McCarthy mountain. When you reach the peak | |
8 | of this particular mountain, you will have an interpreter for the mal | |
9 | Lisp language that is powerful enough to be self-hosting, meaning it | |
10 | will be able to run a mal interpreter written in mal itself. | |
11 | ||
12 | So jump right in (er ... start the climb)! | |
13 | ||
dc791440 DM |
14 | - [Pick a language](#pick-a-language) |
15 | - [Getting started](#getting-started) | |
16 | - [General hints](#general-hints) | |
17 | - [The Make-A-Lisp Process](#the-make-a-lisp-process-1) | |
18 | - [Step 0: The REPL](#step-0-the-repl) | |
19 | - [Step 1: Read and Print](#step-1-read-and-print) | |
20 | - [Step 2: Eval](#step-2-eval) | |
21 | - [Step 3: Environments](#step-3-environments) | |
22 | - [Step 4: If Fn Do](#step-4-if-fn-do) | |
23 | - [Step 5: Tail call optimization](#step-5-tail-call-optimization) | |
24 | - [Step 6: Files, Mutation, and Evil](#step-6-files-mutation-and-evil) | |
25 | - [Step 7: Quoting](#step-7-quoting) | |
26 | - [Step 8: Macros](#step-8-macros) | |
27 | - [Step 9: Try](#step-9-try) | |
28 | - [Step A: Metadata, Self-hosting and Interop](#step-a-metadata-self-hosting-and-interop) | |
29 | ||
30 | ||
0f4ca9d1 JM |
31 | ## Pick a language |
32 | ||
33 | You might already have a language in mind that you want to use. | |
34 | Technically speaking, mal can be implemented in any sufficiently | |
dd4020b9 JM |
35 | complete programming language (i.e. Turing complete), however, there |
36 | are a few language features that can make the task MUCH easier. Here | |
37 | are some of them in rough order of importance: | |
0f4ca9d1 JM |
38 | |
39 | * A sequential compound data structure (e.g. arrays, lists, | |
40 | vectors, etc) | |
41 | * An associative compound data structure (e.g. a dictionary, | |
42 | hash-map, associative array, etc) | |
43 | * Function references (first class functions, function pointers, | |
44 | etc) | |
45 | * Real exception handling (try/catch, raise, throw, etc) | |
46 | * Variable argument functions (variadic, var args, splats, apply, etc) | |
47 | * Function closures | |
48 | * PCRE regular expressions | |
49 | ||
50 | In addition, the following will make your task especially easy: | |
51 | ||
bf518367 JM |
52 | * Dynamic typing / boxed types (specifically, the ability to store |
53 | different data types in the sequential and associative structures | |
54 | and the language keeps track of the type for you) | |
0f4ca9d1 JM |
55 | * Compound data types support arbitrary runtime "hidden" data |
56 | (metadata, metatables, dynamic fields attributes) | |
57 | ||
58 | Here are some examples of languages that have all of the above | |
59 | features: JavaScript, Ruby, Python, Lua, R, Clojure. | |
60 | ||
cf70df1f JM |
61 | Michael Fogus has some great blog posts on interesting but less well |
62 | known languages and many of the languages on his lists do not yet have | |
63 | any mal implementations: | |
64 | * http://blog.fogus.me/2011/08/14/perlis-languages/ | |
65 | * http://blog.fogus.me/2011/10/18/programming-language-development-the-past-5-years/ | |
66 | ||
0f4ca9d1 JM |
67 | Many of the most popular languages already have Mal implementations. |
68 | However, this should not discourage you from creating your own | |
69 | implementation in a language that already has one. However, if you go | |
70 | this route, I suggest you avoid referring to the existing | |
71 | implementations (i.e. "cheating") to maximize your learning experience | |
72 | instead of just borrowing mine. On the other hand, if your goal is to | |
73 | add new implementations to mal as efficiently as possible, then you | |
74 | SHOULD find the most similar target language implementation and refer | |
75 | to it frequently. | |
76 | ||
b284f4b8 JM |
77 | If you want a list of programming languages with an |
78 | approximate measure of popularity try the [RedMonk Programming | |
79 | Language | |
80 | Rankings](https://redmonk.com/sogrady/2019/03/20/language-rankings-1-19/) | |
81 | or the [GitHut 2.0 Project](https://madnight.github.io/githut). | |
bf518367 | 82 | |
0f4ca9d1 JM |
83 | |
84 | ## Getting started | |
85 | ||
86 | * Install your chosen language interpreter/compiler, language package | |
87 | manager and build tools (if applicable) | |
88 | ||
89 | * Fork the mal repository on github and then clone your forked | |
90 | repository: | |
91 | ``` | |
92 | git clone git@github.com:YOUR_NAME/mal.git | |
93 | cd mal | |
94 | ``` | |
95 | ||
fc03712f EC |
96 | * Make a new directory for your implementation. For example, if your |
97 | language is called "quux": | |
0f4ca9d1 JM |
98 | ``` |
99 | mkdir quux | |
100 | ``` | |
101 | ||
102 | * Modify the top level Makefile to allow the tests to be run against | |
103 | your implementation. For example, if your language is named "quux" | |
104 | and uses "qx" as the file extension, then make the following | |
105 | 3 modifications to Makefile: | |
106 | ``` | |
107 | IMPLS = ... quux ... | |
108 | ... | |
109 | quux_STEP_TO_PROG = mylang/$($(1)).qx | |
20e8dea0 JM |
110 | ``` |
111 | ||
8c0163f8 | 112 | * Add a "run" script to your implementation directory that listens to |
20e8dea0 | 113 | the "STEP" environment variable for the implementation step to run |
e1b080b4 JL |
114 | and defaults to "stepA_mal". Make sure the run script has the |
115 | executable file permission set (or else the test runner might fail with a | |
116 | permission denied error message). The following are examples of "run" | |
20e8dea0 JM |
117 | scripts for a compiled language and an interpreted language (where |
118 | the interpreter is named "quux"): | |
119 | ||
120 | ``` | |
121 | #!/bin/bash | |
122 | exec $(dirname $0)/${STEP:-stepA_mal} "${@}" | |
123 | ``` | |
124 | ||
125 | ``` | |
126 | #!/bin/bash | |
127 | exec quux $(dirname $0)/${STEP:-stepA_mal}.qx "${@}" | |
0f4ca9d1 JM |
128 | ``` |
129 | ||
130 | This allows you to run tests against your implementation like this: | |
131 | ``` | |
e5737b08 | 132 | make "test^quux^stepX" |
0f4ca9d1 JM |
133 | ``` |
134 | ||
a15f76bd | 135 | If your implementation language is a compiled language, then you |
396d869e | 136 | should also add a Makefile at the top level of your implementation |
a15f76bd JM |
137 | directory. This Makefile will define how to build the files pointed to |
138 | by the quux_STEP_TO_PROG macro. The top-level Makefile will attempt to | |
139 | build those targets before running tests. If it is a scripting | |
396d869e JM |
140 | language/uncompiled, then no Makefile is necessary because |
141 | quux_STEP_TO_PROG will point to a source file that already exists and | |
142 | does not need to be compiled/built. | |
143 | ||
0f4ca9d1 JM |
144 | |
145 | ## General hints | |
146 | ||
147 | Stackoverflow and Google are your best friends. Modern polyglot | |
148 | developers do not memorize dozens of programming languages. Instead, | |
149 | they learn the peculiar terminology used with each language and then | |
150 | use this to search for their answers. | |
151 | ||
152 | Here are some other resources where multiple languages are | |
153 | compared/described: | |
154 | * http://learnxinyminutes.com/ | |
155 | * http://hyperpolyglot.org/ | |
156 | * http://rosettacode.org/ | |
334a71b6 | 157 | * http://rigaux.org/language-study/syntax-across-languages/ |
0f4ca9d1 JM |
158 | |
159 | Do not let yourself be bogged down by specific problems. While the | |
160 | make-a-lisp process is structured as a series of steps, the reality is | |
161 | that building a lisp interpreter is more like a branching tree. If you | |
162 | get stuck on tail call optimization, or hash-maps, move on to other | |
163 | things. You will often have a stroke of inspiration for a problem as | |
164 | you work through other functionality. I have tried to structure this | |
45a8b3ca JM |
165 | guide and the tests to make clear which things can be deferred until |
166 | later. | |
167 | ||
168 | An aside on deferrable/optional bits: when you run the tests for | |
169 | a given step, the last tests are often marked with an "optional" | |
170 | header. This indicates that these are tests for functionality that is | |
171 | not critical to finish a basic mal implementation. Many of the steps | |
172 | in this process guide have a "Deferrable" section, however, it is not | |
173 | quite the same meaning. Those sections include the functionality that | |
174 | is marked as optional in the tests, but they also include | |
0f4ca9d1 JM |
175 | functionality that becomes mandatory at a later step. In other words, |
176 | this is a "make your own Lisp adventure". | |
177 | ||
178 | Use test driven development. Each step of the make-a-lisp process has | |
179 | a bunch of tests associated with it and there is an easy script to run | |
180 | all the tests for a specific step in the process. Pick a failing test, | |
181 | fix it, repeat until all the tests for that step pass. | |
182 | ||
5c052692 WH |
183 | ## Reference Code |
184 | ||
0f4ca9d1 | 185 | The `process` directory contains abbreviated pseudocode and |
a15f76bd | 186 | architecture diagrams for each step of the make-a-lisp process. Use |
0f4ca9d1 | 187 | a textual diff/comparison tool to compare the previous pseudocode step |
a15f76bd JM |
188 | with the one you are working on. The architecture diagram images have |
189 | changes from the previous step highlighted in red. There is also | |
c1ef46cb | 190 | a concise |
b284f4b8 | 191 | [cheatsheet](http://kanaka.github.io/mal/cheatsheet.html) that |
c1ef46cb | 192 | summarizes the key changes at each step. |
0f4ca9d1 | 193 | |
daa1cf3f JM |
194 | If you get completely stuck and are feeling like giving up, then you |
195 | should "cheat" by referring to the same step or functionality in | |
196 | a existing implementation language. You are here to learn, not to take | |
197 | a test, so do not feel bad about it. Okay, you should feel a little | |
198 | bit bad about it. | |
0f4ca9d1 JM |
199 | |
200 | ||
201 | ## The Make-A-Lisp Process | |
202 | ||
dd5c00a9 NB |
203 | Feel free to follow the guide as literally or as loosely as you |
204 | like. You are here to learn; wandering off the beaten path may be the | |
205 | way you learn best. However, each step builds on the previous steps, | |
206 | so if you are new to Lisp or new to your implementation language then | |
207 | you may want to stick more closely to the guide your first time | |
208 | through to avoid frustration at later steps. | |
209 | ||
0f4ca9d1 JM |
210 | In the steps that follow the name of the target language is "quux" and |
211 | the file extension for that language is "qx". | |
212 | ||
213 | ||
214 | <a name="step0"></a> | |
215 | ||
216 | ### Step 0: The REPL | |
217 | ||
218 | ![step0_repl architecture](step0_repl.png) | |
219 | ||
220 | This step is basically just creating a skeleton of your interpreter. | |
221 | ||
222 | * Create a `step0_repl.qx` file in `quux/`. | |
223 | ||
224 | * Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep` | |
225 | (read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just | |
226 | stubs that return their first parameter (a string if your target | |
227 | language is a statically typed) and `rep` calls them in order | |
228 | passing the return to the input of the next. | |
229 | ||
58ba5af4 JM |
230 | * Add a main loop that repeatedly prints a prompt (needs to be |
231 | "user> " for later tests to pass), gets a line of input from the | |
232 | user, calls `rep` with that line of input, and then prints out the | |
233 | result from `rep`. It should also exit when you send it an EOF | |
234 | (often Ctrl-D). | |
0f4ca9d1 JM |
235 | |
236 | * If you are using a compiled (ahead-of-time rather than just-in-time) | |
237 | language, then create a Makefile (or appropriate project definition | |
238 | file) in your directory. | |
239 | ||
daa1cf3f JM |
240 | It is time to run your first tests. This will check that your program |
241 | does input and output in a way that can be captured by the test | |
242 | harness. Go to the top level and run the following: | |
243 | ``` | |
e5737b08 | 244 | make "test^quux^step0" |
daa1cf3f | 245 | ``` |
0f4ca9d1 JM |
246 | |
247 | Add and then commit your new `step0_repl.qx` and `Makefile` to git. | |
248 | ||
249 | Congratulations! You have just completed the first step of the | |
250 | make-a-lisp process. | |
251 | ||
252 | ||
bd62ff74 | 253 | #### Optional: |
0f4ca9d1 JM |
254 | |
255 | * Add full line editing and command history support to your | |
256 | interpreter REPL. Many languages have a library/module that provide | |
257 | line editing support. Another option if your language supports it is | |
258 | to use an FFI (foreign function interface) to load and call directly | |
94a954f5 | 259 | into GNU readline, editline, or linenoise library. Add line |
0f4ca9d1 JM |
260 | editing interface code to `readline.qx` |
261 | ||
262 | ||
263 | <a name="step1"></a> | |
264 | ||
265 | ### Step 1: Read and Print | |
266 | ||
267 | ![step1_read_print architecture](step1_read_print.png) | |
268 | ||
269 | In this step, your interpreter will "read" the string from the user | |
270 | and parse it into an internal tree data structure (an abstract syntax | |
271 | tree) and then take that data structure and "print" it back to | |
272 | a string. | |
273 | ||
274 | In non-lisp languages, this step (called "lexing and parsing") can be | |
275 | one of the most complicated parts of the compiler/interpreter. In | |
276 | Lisp, the data structure that you want in memory is basically | |
277 | represented directly in the code that the programmer writes | |
278 | (homoiconicity). | |
279 | ||
280 | For example, if the string is "(+ 2 (* 3 4))" then the read function | |
281 | will process this into a tree structure that looks like this: | |
282 | ``` | |
283 | List | |
284 | / | \ | |
285 | / | \ | |
286 | / | \ | |
287 | Sym:+ Int:2 List | |
288 | / | \ | |
289 | / | \ | |
290 | / | \ | |
291 | Sym:* Int:3 Int:4 | |
292 | ``` | |
293 | ||
294 | Each left paren and its matching right paren (lisp "sexpr") becomes | |
295 | a node in the tree and everything else becomes a leaf in the tree. | |
296 | ||
297 | If you can find code for an implementation of a JSON encoder/decoder | |
298 | in your target language then you can probably just borrow and modify | |
299 | that and be 75% of the way done with this step. | |
300 | ||
301 | The rest of this section is going to assume that you are not starting | |
302 | from an existing JSON encoder/decoder, but that you do have access to | |
303 | a Perl compatible regular expressions (PCRE) module/library. You can | |
304 | certainly implement the reader using simple string operations, but it | |
305 | is more involved. The `make`, `ps` (postscript) and Haskell | |
306 | implementations have examples of a reader/parser without using regular | |
307 | expression support. | |
308 | ||
309 | * Copy `step0_repl.qx` to `step1_read_print.qx`. | |
310 | ||
311 | * Add a `reader.qx` file to hold functions related to the reader. | |
312 | ||
313 | * If the target language has objects types (OOP), then the next step | |
314 | is to create a simple stateful Reader object in `reader.qx`. This | |
315 | object will store the tokens and a position. The Reader object will | |
d717d991 | 316 | have two methods: `next` and `peek`. `next` returns the token at |
0f4ca9d1 JM |
317 | the current position and increments the position. `peek` just |
318 | returns the token at the current position. | |
319 | ||
320 | * Add a function `read_str` in `reader.qx`. This function | |
f8880251 | 321 | will call `tokenize` and then create a new Reader object instance |
0f4ca9d1 JM |
322 | with the tokens. Then it will call `read_form` with the Reader |
323 | instance. | |
324 | ||
f8880251 | 325 | * Add a function `tokenize` in `reader.qx`. This function will take |
2315fd53 | 326 | a single string and return an array/list |
0f4ca9d1 JM |
327 | of all the tokens (strings) in it. The following regular expression |
328 | (PCRE) will match all mal tokens. | |
329 | ``` | |
4aa0ebdf | 330 | [\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"?|;.*|[^\s\[\]{}('"`,;)]*) |
0f4ca9d1 | 331 | ``` |
cfdf00cc OR |
332 | * For each match captured within the parenthesis starting at char 6 of the |
333 | regular expression a new token will be created. | |
334 | ||
335 | * `[\s,]*`: Matches any number of whitespaces or commas. This is not captured | |
336 | so it will be ignored and not tokenized. | |
337 | ||
338 | * `~@`: Captures the special two-characters `~@` (tokenized). | |
339 | ||
340 | * ```[\[\]{}()'`~^@]```: Captures any special single character, one of | |
185052f1 | 341 | ```[]{}()'`~^@``` (tokenized). |
cfdf00cc | 342 | |
4aa0ebdf | 343 | * `"(?:\\.|[^\\"])*"?`: Starts capturing at a double-quote and stops at the |
c5dd7ecb | 344 | next double-quote unless it was preceded by a backslash in which case it |
4aa0ebdf JM |
345 | includes it until the next double-quote (tokenized). It will also |
346 | match unbalanced strings (no ending double-quote) which should be | |
347 | reported as an error. | |
cfdf00cc OR |
348 | |
349 | * `;.*`: Captures any sequence of characters starting with `;` (tokenized). | |
350 | ||
351 | * ```[^\s\[\]{}('"`,;)]*```: Captures a sequence of zero or more non special | |
352 | characters (e.g. symbols, numbers, "true", "false", and "nil") and is sort | |
353 | of the inverse of the one above that captures special characters (tokenized). | |
0f4ca9d1 JM |
354 | |
355 | * Add the function `read_form` to `reader.qx`. This function | |
356 | will peek at the first token in the Reader object and switch on the | |
357 | first character of that token. If the character is a left paren then | |
358 | `read_list` is called with the Reader object. Otherwise, `read_atom` | |
359 | is called with the Reader Object. The return value from `read_form` | |
360 | is a mal data type. If your target language is statically typed then | |
361 | you will need some way for `read_form` to return a variant or | |
362 | subclass type. For example, if your language is object oriented, | |
2315fd53 | 363 | then you can define a top level MalType (in `types.qx`) that all |
0f4ca9d1 | 364 | your mal data types inherit from. The MalList type (which also |
cc52842b | 365 | inherits from MalType) will contain a list/array of other MalTypes. |
0f4ca9d1 JM |
366 | If your language is dynamically typed then you can likely just |
367 | return a plain list/array of other mal types. | |
368 | ||
369 | * Add the function `read_list` to `reader.qx`. This function will | |
370 | repeatedly call `read_form` with the Reader object until it | |
371 | encounters a ')' token (if it reach EOF before reading a ')' then | |
372 | that is an error). It accumulates the results into a List type. If | |
373 | your language does not have a sequential data type that can hold mal | |
374 | type values you may need to implement one (in `types.qx`). Note | |
375 | that `read_list` repeatedly calls `read_form` rather than | |
a85b8454 | 376 | `read_atom`. This mutually recursive definition between `read_list` |
0f4ca9d1 JM |
377 | and `read_form` is what allows lists to contain lists. |
378 | ||
379 | * Add the function `read_atom` to `reader.qx`. This function will | |
380 | look at the contents of the token and return the appropriate scalar | |
381 | (simple/single) data type value. Initially, you can just implement | |
b84a50cc | 382 | numbers (integers) and symbols. This will allow you to proceed |
0f4ca9d1 JM |
383 | through the next couple of steps before you will need to implement |
384 | the other fundamental mal types: nil, true, false, and string. The | |
96065d55 BH |
385 | remaining scalar mal type, keyword does not |
386 | need to be implemented until step A (but can be implemented at any | |
0f4ca9d1 JM |
387 | point between this step and that). BTW, symbols types are just an |
388 | object that contains a single string name value (some languages have | |
389 | symbol types already). | |
390 | ||
391 | * Add a file `printer.qx`. This file will contain a single function | |
392 | `pr_str` which does the opposite of `read_str`: take a mal data | |
393 | structure and return a string representation of it. But `pr_str` is | |
394 | much simpler and is basically just a switch statement on the type of | |
395 | the input object: | |
396 | ||
397 | * symbol: return the string name of the symbol | |
398 | * number: return the number as a string | |
399 | * list: iterate through each element of the list calling `pr_str` on | |
400 | it, then join the results with a space separator, and surround the | |
401 | final result with parens | |
402 | ||
403 | * Change the `READ` function in `step1_read_print.qx` to call | |
404 | `reader.read_str` and the `PRINT` function to call `printer.pr_str`. | |
405 | `EVAL` continues to simply return its input but the type is now | |
406 | a mal data type. | |
407 | ||
408 | You now have enough hooked up to begin testing your code. You can | |
409 | manually try some simple inputs: | |
410 | * `123` -> `123` | |
411 | * ` 123 ` -> `123` | |
412 | * `abc` -> `abc` | |
413 | * ` abc ` -> `abc` | |
414 | * `(123 456)` -> `(123 456)` | |
415 | * `( 123 456 789 ) ` -> `(123 456 789)` | |
6a767b0d | 416 | * `( + 2 (* 3 4) ) ` -> `(+ 2 (* 3 4))` |
0f4ca9d1 JM |
417 | |
418 | To verify that your code is doing more than just eliminating extra | |
419 | spaces (and not failing), you can instrument your `reader.qx` functions. | |
420 | ||
421 | Once you have gotten past those simple manual tests, it is time to run | |
422 | the full suite of step 1 tests. Go to the top level and run the | |
423 | following: | |
424 | ``` | |
e5737b08 | 425 | make "test^quux^step1" |
0f4ca9d1 JM |
426 | ``` |
427 | ||
428 | Fix any test failures related to symbols, numbers and lists. | |
429 | ||
430 | Depending on the functionality of your target language, it is likely | |
431 | that you have now just completed one of the most difficult steps. It | |
432 | is down hill from here. The remaining steps will probably be easier | |
433 | and each step will give progressively more bang for the buck. | |
434 | ||
45a8b3ca | 435 | #### Deferrable: |
0f4ca9d1 JM |
436 | |
437 | ||
0f4ca9d1 | 438 | * Add support for the other basic data type to your reader and printer |
1432efa2 BH |
439 | functions: string, nil, true, and false. Nil, true, and false |
440 | become mandatory at step 4, strings at step 6. When a string is read, | |
441 | the following transformations are | |
8d78bc26 JM |
442 | applied: a backslash followed by a doublequote is translated into |
443 | a plain doublequote character, a backslash followed by "n" is | |
444 | translated into a newline, and a backslash followed by another | |
445 | backslash is translated into a single backslash. To properly print | |
446 | a string (for step 4 string functions), the `pr_str` function needs | |
447 | another parameter called `print_readably`. When `print_readably` is | |
448 | true, doublequotes, newlines, and backslashes are translated into | |
449 | their printed representations (the reverse of the reader). The | |
450 | `PRINT` function in the main program should call `pr_str` with | |
451 | print_readably set to true. | |
0f4ca9d1 | 452 | |
4b7ee8fa JM |
453 | * Add error checking to your reader functions to make sure parens |
454 | are properly matched. Catch and print these errors in your main | |
455 | loop. If your language does not have try/catch style bubble up | |
456 | exception handling, then you will need to add explicit error | |
457 | handling to your code to catch and pass on errors without crashing. | |
458 | ||
459 | * Add support for reader macros which are forms that are | |
460 | transformed into other forms during the read phase. Refer to | |
461 | `tests/step1_read_print.mal` for the form that these macros should | |
462 | take (they are just simple transformations of the token stream). | |
463 | ||
1c2eab90 | 464 | * Add support for the other mal types: keyword, vector, hash-map. |
2345a3da JM |
465 | * keyword: a keyword is a token that begins with a colon. A keyword |
466 | can just be stored as a string with special unicode prefix like | |
467 | 0x29E (or char 0xff/127 if the target language does not have good | |
468 | unicode support) and the printer translates strings with that | |
469 | prefix back to the keyword representation. This makes it easy to | |
470 | use keywords as hash map keys in most languages. You can also | |
471 | store keywords as a unique data type, but you will need to make | |
472 | sure they can be used as hash map keys (which may involve doing | |
473 | a similar prefixed translation anyways). | |
474 | * vector: a vector can be implemented with same underlying | |
475 | type as a list as long as there is some mechanism to keep track of | |
476 | the difference. You can use the same reader function for both | |
477 | lists and vectors by adding parameters for the starting and ending | |
478 | tokens. | |
479 | * hash-map: a hash-map is an associative data structure that maps | |
480 | strings to other mal values. If you implement keywords as prefixed | |
481 | strings, then you only need a native associative data structure | |
482 | which supports string keys. Clojure allows any value to be a hash | |
483 | map key, but the base functionality in mal is to support strings | |
484 | and keyword keys. Because of the representation of hash-maps as | |
485 | an alternating sequence of keys and values, you can probably use | |
486 | the same reader function for hash-maps as lists and vectors with | |
487 | parameters to indicate the starting and ending tokens. The odd | |
488 | tokens are then used for keys with the corresponding even tokens | |
489 | as the values. | |
490 | ||
0f4ca9d1 JM |
491 | * Add comment support to your reader. The tokenizer should ignore |
492 | tokens that start with ";". Your `read_str` function will need to | |
493 | properly handle when the tokenizer returns no values. The simplest | |
494 | way to do this is to return `nil` mal value. A cleaner option (that | |
495 | does not print `nil` at the prompt is to throw a special exception | |
496 | that causes the main loop to simply continue at the beginning of the | |
497 | loop without calling `rep`. | |
498 | ||
499 | ||
500 | <a name="step2"></a> | |
501 | ||
502 | ### Step 2: Eval | |
503 | ||
504 | ![step2_eval architecture](step2_eval.png) | |
505 | ||
506 | In step 1 your mal interpreter was basically just a way to validate | |
507 | input and eliminate extraneous white space. In this step you will turn | |
508 | your interpreter into a simple number calculator by adding | |
509 | functionality to the evaluator (`EVAL`). | |
510 | ||
511 | Compare the pseudocode for step 1 and step 2 to get a basic idea of | |
512 | the changes that will be made during this step: | |
513 | ``` | |
514 | diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt | |
515 | ``` | |
516 | ||
517 | * Copy `step1_read_print.qx` to `step2_eval.qx`. | |
518 | ||
519 | * Define a simple initial REPL environment. This environment is an | |
520 | associative structure that maps symbols (or symbol names) to | |
521 | numeric functions. For example, in python this would look something | |
522 | like this: | |
523 | ``` | |
524 | repl_env = {'+': lambda a,b: a+b, | |
525 | '-': lambda a,b: a-b, | |
526 | '*': lambda a,b: a*b, | |
527 | '/': lambda a,b: int(a/b)} | |
528 | ``` | |
529 | ||
530 | * Modify the `rep` function to pass the REPL environment as the second | |
531 | parameter for the `EVAL` call. | |
532 | ||
533 | * Create a new function `eval_ast` which takes `ast` (mal data type) | |
534 | and an associative structure (the environment from above). | |
535 | `eval_ast` switches on the type of `ast` as follows: | |
536 | ||
537 | * symbol: lookup the symbol in the environment structure and return | |
0198b7a2 | 538 | the value or raise an error if no value is found |
0f4ca9d1 JM |
539 | * list: return a new list that is the result of calling `EVAL` on |
540 | each of the members of the list | |
541 | * otherwise just return the original `ast` value | |
542 | ||
543 | * Modify `EVAL` to check if the first parameter `ast` is a list. | |
544 | * `ast` is not a list: then return the result of calling `eval_ast` | |
545 | on it. | |
efa2daef | 546 | * `ast` is a empty list: return ast unchanged. |
0f4ca9d1 JM |
547 | * `ast` is a list: call `eval_ast` to get a new evaluated list. Take |
548 | the first item of the evaluated list and call it as function using | |
549 | the rest of the evaluated list as its arguments. | |
550 | ||
551 | If your target language does not have full variable length argument | |
552 | support (e.g. variadic, vararg, splats, apply) then you will need to | |
553 | pass the full list of arguments as a single parameter and split apart | |
554 | the individual values inside of every mal function. This is annoying, | |
555 | but workable. | |
556 | ||
557 | The process of taking a list and invoking or executing it to return | |
558 | something new is known in Lisp as the "apply" phase. | |
559 | ||
560 | Try some simple expressions: | |
561 | ||
562 | * `(+ 2 3)` -> `5` | |
563 | * `(+ 2 (* 3 4))` -> `14` | |
564 | ||
565 | The most likely challenge you will encounter is how to properly call | |
566 | a function references using an arguments list. | |
567 | ||
568 | Now go to the top level, run the step 2 tests and fix the errors. | |
569 | ``` | |
e5737b08 | 570 | make "test^quux^step2" |
0f4ca9d1 JM |
571 | ``` |
572 | ||
573 | You now have a simple prefix notation calculator! | |
574 | ||
0748211b DM |
575 | #### Deferrable: |
576 | ||
577 | * `eval_ast` should evaluate elements of vectors and hash-maps. Add the | |
578 | following cases in `eval_ast`: | |
579 | * If `ast` is a vector: return a new vector that is the result of calling | |
580 | `EVAL` on each of the members of the vector. | |
581 | * If `ast` is a hash-map: return a new hash-map which consists of key-value | |
582 | pairs where the key is a key from the hash-map and the value is the result | |
583 | of calling `EVAL` on the corresponding value. | |
584 | ||
0f4ca9d1 JM |
585 | |
586 | <a name="step3"></a> | |
587 | ||
588 | ### Step 3: Environments | |
589 | ||
590 | ![step3_env architecture](step3_env.png) | |
591 | ||
592 | In step 2 you were already introduced to REPL environment (`repl_env`) | |
593 | where the basic numeric functions were stored and looked up. In this | |
594 | step you will add the ability to create new environments (`let*`) and | |
daa1cf3f | 595 | modify existing environments (`def!`). |
0f4ca9d1 JM |
596 | |
597 | A Lisp environment is an associative data structure that maps symbols (the | |
598 | keys) to values. But Lisp environments have an additional important | |
599 | function: they can refer to another environment (the outer | |
600 | environment). During environment lookups, if the current environment | |
601 | does not have the symbol, the lookup continues in the outer | |
602 | environment, and continues this way until the symbol is either found, | |
603 | or the outer environment is `nil` (the outermost environment in the | |
604 | chain). | |
605 | ||
606 | Compare the pseudocode for step 2 and step 3 to get a basic idea of | |
607 | the changes that will be made during this step: | |
608 | ``` | |
609 | diff -urp ../process/step2_eval.txt ../process/step3_env.txt | |
610 | ``` | |
611 | ||
d9c020b0 | 612 | * Copy `step2_eval.qx` to `step3_env.qx`. |
0f4ca9d1 JM |
613 | |
614 | * Create `env.qx` to hold the environment definition. | |
615 | ||
616 | * Define an `Env` object that is instantiated with a single `outer` | |
617 | parameter and starts with an empty associative data structure | |
618 | property `data`. | |
619 | ||
620 | * Define three methods for the Env object: | |
621 | * set: takes a symbol key and a mal value and adds to the `data` | |
622 | structure | |
623 | * find: takes a symbol key and if the current environment contains | |
624 | that key then return the environment. If no key is found and outer | |
625 | is not `nil` then call find (recurse) on the outer environment. | |
626 | * get: takes a symbol key and uses the `find` method to locate the | |
627 | environment with the key, then returns the matching value. If no | |
628 | key is found up the outer chain, then throws/raises a "not found" | |
629 | error. | |
630 | ||
169ddeb2 | 631 | * Update `step3_env.qx` to use the new `Env` type to create the |
0f4ca9d1 JM |
632 | repl_env (with a `nil` outer value) and use the `set` method to add |
633 | the numeric functions. | |
634 | ||
635 | * Modify `eval_ast` to call the `get` method on the `env` parameter. | |
636 | ||
637 | * Modify the apply section of `EVAL` to switch on the first element of | |
638 | the list: | |
639 | * symbol "def!": call the set method of the current environment | |
640 | (second parameter of `EVAL` called `env`) using the unevaluated | |
641 | first parameter (second list element) as the symbol key and the | |
642 | evaluated second parameter as the value. | |
a727d6e8 | 643 | * symbol "let\*": create a new environment using the current |
0f4ca9d1 | 644 | environment as the outer value and then use the first parameter as |
a727d6e8 MK |
645 | a list of new bindings in the "let\*" environment. Take the second |
646 | element of the binding list, call `EVAL` using the new "let\*" | |
0f4ca9d1 | 647 | environment as the evaluation environment, then call `set` on the |
a727d6e8 | 648 | "let\*" environment using the first binding list element as the key |
0f4ca9d1 JM |
649 | and the evaluated second element as the value. This is repeated |
650 | for each odd/even pair in the binding list. Note in particular, | |
651 | the bindings earlier in the list can be referred to by later | |
652 | bindings. Finally, the second parameter (third element) of the | |
a727d6e8 | 653 | original `let*` form is evaluated using the new "let\*" environment |
0f4ca9d1 JM |
654 | and the result is returned as the result of the `let*` (the new |
655 | let environment is discarded upon completion). | |
656 | * otherwise: call `eval_ast` on the list and apply the first element | |
657 | to the rest as before. | |
658 | ||
659 | `def!` and `let*` are Lisp "specials" (or "special atoms") which means | |
660 | that they are language level features and more specifically that the | |
661 | rest of the list elements (arguments) may be evaluated differently (or | |
662 | not at all) unlike the default apply case where all elements of the | |
663 | list are evaluated before the first element is invoked. Lists which | |
664 | contain a "special" as the first element are known as "special forms". | |
34935988 | 665 | They are special because they follow special evaluation rules. |
0f4ca9d1 JM |
666 | |
667 | Try some simple environment tests: | |
668 | ||
669 | * `(def! a 6)` -> `6` | |
670 | * `a` -> `6` | |
671 | * `(def! b (+ a 2))` -> `8` | |
672 | * `(+ a b)` -> `14` | |
673 | * `(let* (c 2) c)` -> `2` | |
674 | ||
675 | Now go to the top level, run the step 3 tests and fix the errors. | |
676 | ``` | |
e5737b08 | 677 | make "test^quux^step3" |
0f4ca9d1 JM |
678 | ``` |
679 | ||
680 | You mal implementation is still basically just a numeric calculator | |
681 | with save/restore capability. But you have set the foundation for step | |
682 | 4 where it will begin to feel like a real programming language. | |
683 | ||
684 | ||
685 | An aside on mutation and typing: | |
686 | ||
687 | The "!" suffix on symbols is used to indicate that this symbol refers | |
688 | to a function that mutates something else. In this case, the `def!` | |
689 | symbol indicates a special form that will mutate the current | |
690 | environment. Many (maybe even most) of runtime problems that are | |
691 | encountered in software engineering are a result of mutation. By | |
692 | clearly marking code where mutation may occur, you can more easily | |
693 | track down the likely cause of runtime problems when they do occur. | |
694 | ||
695 | Another cause of runtime errors is type errors, where a value of one | |
696 | type is unexpectedly treated by the program as a different and | |
697 | incompatible type. Statically typed languages try to make the | |
698 | programmer solve all type problems before the program is allowed to | |
699 | run. Most Lisp variants tend to be dynamically typed (types of values | |
700 | are checked when they are actually used at runtime). | |
701 | ||
702 | As an aside-aside: The great debate between static and dynamic typing | |
4e3fd73b MK |
703 | can be understood by following the money. Advocates of strict static |
704 | typing use words like "correctness" and "safety" and thus get | |
0f4ca9d1 JM |
705 | government and academic funding. Advocates of dynamic typing use words |
706 | like "agile" and "time-to-market" and thus get venture capital and | |
707 | commercial funding. | |
708 | ||
709 | ||
710 | <a name="step4"></a> | |
711 | ||
712 | ### Step 4: If Fn Do | |
713 | ||
714 | ![step4_if_fn_do architecture](step4_if_fn_do.png) | |
715 | ||
716 | In step 3 you added environments and the special forms for | |
717 | manipulating environments. In this step you will add 3 new special | |
718 | forms (`if`, `fn*` and `do`) and add several more core functions to | |
719 | the default REPL environment. Our new architecture will look like | |
720 | this: | |
721 | ||
722 | The `fn*` special form is how new user-defined functions are created. | |
723 | In some Lisps, this special form is named "lambda". | |
724 | ||
725 | Compare the pseudocode for step 3 and step 4 to get a basic idea of | |
726 | the changes that will be made during this step: | |
727 | ``` | |
728 | diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt | |
729 | ``` | |
730 | ||
731 | * Copy `step3_env.qx` to `step4_if_fn_do.qx`. | |
732 | ||
733 | * If you have not implemented reader and printer support (and data | |
734 | types) for `nil`, `true` and `false`, you will need to do so for | |
735 | this step. | |
736 | ||
737 | * Update the constructor/initializer for environments to take two new | |
738 | arguments: `binds` and `exprs`. Bind (`set`) each element (symbol) | |
739 | of the binds list to the respective element of the `exprs` list. | |
740 | ||
741 | * Add support to `printer.qx` to print functions values. A string | |
31171098 | 742 | literal like "#\<function>" is sufficient. |
0f4ca9d1 | 743 | |
1af1f7f1 | 744 | * Add the following special forms to `EVAL`: |
0f4ca9d1 | 745 | |
1af1f7f1 | 746 | * `do`: Evaluate all the elements of the list using `eval_ast` |
f558a0c8 | 747 | and return the final evaluated element. |
0f4ca9d1 JM |
748 | * `if`: Evaluate the first parameter (second element). If the result |
749 | (condition) is anything other than `nil` or `false`, then evaluate | |
1af1f7f1 | 750 | the second parameter (third element of the list) and return the |
0f4ca9d1 JM |
751 | result. Otherwise, evaluate the third parameter (fourth element) |
752 | and return the result. If condition is false and there is no third | |
753 | parameter, then just return `nil`. | |
754 | * `fn*`: Return a new function closure. The body of that closure | |
755 | does the following: | |
756 | * Create a new environment using `env` (closed over from outer | |
757 | scope) as the `outer` parameter, the first parameter (second | |
758 | list element of `ast` from the outer scope) as the `binds` | |
759 | parameter, and the parameters to the closure as the `exprs` | |
760 | parameter. | |
761 | * Call `EVAL` on the second parameter (third list element of `ast` | |
762 | from outer scope), using the new environment. Use the result as | |
763 | the return value of the closure. | |
764 | ||
765 | If your target language does not support closures, then you will need | |
766 | to implement `fn*` using some sort of structure or object that stores | |
767 | the values being closed over: the first and second elements of the | |
768 | `ast` list (function parameter list and function body) and the current | |
769 | environment `env`. In this case, your native functions will need to be | |
770 | wrapped in the same way. You will probably also need a method/function | |
771 | that invokes your function object/structure for the default case of | |
772 | the apply section of `EVAL`. | |
773 | ||
774 | Try out the basic functionality you have implemented: | |
775 | ||
8e796aa4 BH |
776 | * `(fn* (a) a)` -> `#<function>` |
777 | * `( (fn* (a) a) 7)` -> `7` | |
778 | * `( (fn* (a) (+ a 1)) 10)` -> `11` | |
779 | * `( (fn* (a b) (+ a b)) 2 3)` -> `5` | |
0f4ca9d1 JM |
780 | |
781 | * Add a new file `core.qx` and define an associative data structure | |
782 | `ns` (namespace) that maps symbols to functions. Move the numeric | |
783 | function definitions into this structure. | |
784 | ||
785 | * Modify `step4_if_fn_do.qx` to iterate through the `core.ns` | |
786 | structure and add (`set`) each symbol/function mapping to the | |
787 | REPL environment (`repl_env`). | |
788 | ||
789 | * Add the following functions to `core.ns`: | |
4b7ee8fa JM |
790 | * `prn`: call `pr_str` on the first parameter with `print_readably` |
791 | set to true, prints the result to the screen and then return | |
a1eb30fc | 792 | `nil`. Note that the full version of `prn` is a deferrable below. |
0f4ca9d1 JM |
793 | * `list`: take the parameters and return them as a list. |
794 | * `list?`: return true if the first parameter is a list, false | |
795 | otherwise. | |
796 | * `empty?`: treat the first parameter as a list and return true if | |
797 | the list is empty and false if it contains any elements. | |
798 | * `count`: treat the first parameter as a list and return the number | |
799 | of elements that it contains. | |
800 | * `=`: compare the first two parameters and return true if they are | |
801 | the same type and contain the same value. In the case of equal | |
802 | length lists, each element of the list should be compared for | |
803 | equality and if they are the same return true, otherwise false. | |
804 | * `<`, `<=`, `>`, and `>=`: treat the first two parameters as | |
805 | numbers and do the corresponding numeric comparison, returning | |
806 | either true or false. | |
807 | ||
808 | Now go to the top level, run the step 4 tests. There are a lot of | |
809 | tests in step 4 but all of the non-optional tests that do not involve | |
810 | strings should be able to pass now. | |
811 | ||
812 | ``` | |
e5737b08 | 813 | make "test^quux^step4" |
0f4ca9d1 JM |
814 | ``` |
815 | ||
816 | Your mal implementation is already beginning to look like a real | |
817 | language. You have flow control, conditionals, user-defined functions | |
818 | with lexical scope, side-effects (if you implement the string | |
1380d8c1 JM |
819 | functions), etc. However, our little interpreter has not quite reached |
820 | Lisp-ness yet. The next several steps will take your implementation | |
821 | from a neat toy to a full featured language. | |
0f4ca9d1 | 822 | |
45a8b3ca | 823 | #### Deferrable: |
0f4ca9d1 JM |
824 | |
825 | * Implement Clojure-style variadic function parameters. Modify the | |
826 | constructor/initializer for environments, so that if a "&" symbol is | |
827 | encountered in the `binds` list, the next symbol in the `binds` list | |
828 | after the "&" is bound to the rest of the `exprs` list that has not | |
829 | been bound yet. | |
830 | ||
1c2eab90 | 831 | * Define a `not` function using mal itself. In `step4_if_fn_do.qx` |
0f4ca9d1 JM |
832 | call the `rep` function with this string: |
833 | "(def! not (fn* (a) (if a false true)))". | |
834 | ||
835 | * Implement the strings functions in `core.qx`. To implement these | |
836 | functions, you will need to implement the string support in the | |
45a8b3ca | 837 | reader and printer (deferrable section of step 1). Each of the string |
0f4ca9d1 JM |
838 | functions takes multiple mal values, prints them (`pr_str`) and |
839 | joins them together into a new string. | |
840 | * `pr-str`: calls `pr_str` on each argument with `print_readably` | |
841 | set to true, joins the results with " " and returns the new | |
842 | string. | |
843 | * `str`: calls `pr_str` on each argument with `print_readably` set | |
844 | to false, concatenates the results together ("" separator), and | |
845 | returns the new string. | |
846 | * `prn`: calls `pr_str` on each argument with `print_readably` set | |
847 | to true, joins the results with " ", prints the string to the | |
848 | screen and then returns `nil`. | |
849 | * `println`: calls `pr_str` on each argument with `print_readably` set | |
850 | to false, joins the results with " ", prints the string to the | |
851 | screen and then returns `nil`. | |
852 | ||
853 | ||
854 | <a name="step5"></a> | |
855 | ||
856 | ### Step 5: Tail call optimization | |
857 | ||
858 | ![step5_tco architecture](step5_tco.png) | |
859 | ||
860 | In step 4 you added special forms `do`, `if` and `fn*` and you defined | |
861 | some core functions. In this step you will add a Lisp feature called | |
862 | tail call optimization (TCO). Also called "tail recursion" or | |
863 | sometimes just "tail calls". | |
864 | ||
865 | Several of the special forms that you have defined in `EVAL` end up | |
866 | calling back into `EVAL`. For those forms that call `EVAL` as the last | |
867 | thing that they do before returning (tail call) you will just loop back | |
868 | to the beginning of eval rather than calling it again. The advantage | |
869 | of this approach is that it avoids adding more frames to the call | |
1c2eab90 JK |
870 | stack. This is especially important in Lisp languages because they tend |
871 | to prefer using recursion instead of iteration for control structures. | |
872 | (Though some Lisps, such as Common Lisp, have iteration.) However, with | |
873 | tail call optimization, recursion can be made as stack efficient as | |
874 | iteration. | |
0f4ca9d1 JM |
875 | |
876 | Compare the pseudocode for step 4 and step 5 to get a basic idea of | |
877 | the changes that will be made during this step: | |
878 | ``` | |
879 | diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt | |
880 | ``` | |
881 | ||
0303d604 | 882 | * Copy `step4_if_fn_do.qx` to `step5_tco.qx`. |
0f4ca9d1 JM |
883 | |
884 | * Add a loop (e.g. while true) around all code in `EVAL`. | |
885 | ||
886 | * Modify each of the following form cases to add tail call recursion | |
887 | support: | |
888 | * `let*`: remove the final `EVAL` call on the second `ast` argument | |
889 | (third list element). Set `env` (i.e. the local variable passed in | |
890 | as second parameter of `EVAL`) to the new let environment. Set | |
891 | `ast` (i.e. the local variable passed in as first parameter of | |
892 | `EVAL`) to be the second `ast` argument. Continue at the beginning | |
893 | of the loop (no return). | |
894 | * `do`: change the `eval_ast` call to evaluate all the parameters | |
4e7296f9 JM |
895 | except for the last (2nd list element up to but not including |
896 | last). Set `ast` to the last element of `ast`. Continue | |
0f4ca9d1 JM |
897 | at the beginning of the loop (`env` stays unchanged). |
898 | * `if`: the condition continues to be evaluated, however, rather | |
899 | than evaluating the true or false branch, `ast` is set to the | |
900 | unevaluated value of the chosen branch. Continue at the beginning | |
901 | of the loop (`env` is unchanged). | |
902 | ||
903 | * The return value from the `fn*` special form will now become an | |
904 | object/structure with attributes that allow the default invoke case | |
905 | of `EVAL` to do TCO on mal functions. Those attributes are: | |
0f4ca9d1 JM |
906 | * `ast`: the second `ast` argument (third list element) representing |
907 | the body of the function. | |
908 | * `params`: the first `ast` argument (second list element) | |
909 | representing the parameter names of the function. | |
910 | * `env`: the current value of the `env` parameter of `EVAL`. | |
4e7296f9 JM |
911 | * `fn`: the original function value (i.e. what was return by `fn*` |
912 | in step 4). Note that this is deferrable until step 9 when it is | |
16ca60da JM |
913 | required for the `map` and `apply` core functions). You will also |
914 | need it in step 6 if you choose to not to defer atoms/`swap!` from | |
915 | that step. | |
0f4ca9d1 JM |
916 | |
917 | * The default "apply"/invoke case of `EVAL` must now be changed to | |
918 | account for the new object/structure returned by the `fn*` form. | |
6af96887 GL |
919 | Continue to call `eval_ast` on `ast`. The first element of the |
920 | result of `eval_ast` is `f` and the remaining elements are in `args`. | |
0f4ca9d1 JM |
921 | Switch on the type of `f`: |
922 | * regular function (not one defined by `fn*`): apply/invoke it as | |
4e7296f9 | 923 | before (in step 4). |
0f4ca9d1 JM |
924 | * a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate |
925 | a new environment using the `env` and `params` attributes of `f` | |
6af96887 GL |
926 | as the `outer` and `binds` arguments and `args` as the `exprs` |
927 | argument. Set `env` to the new environment. Continue at the | |
928 | beginning of the loop. | |
0f4ca9d1 JM |
929 | |
930 | Run some manual tests from previous steps to make sure you have not | |
931 | broken anything by adding TCO. | |
932 | ||
933 | Now go to the top level, run the step 5 tests. | |
934 | ||
935 | ``` | |
e5737b08 | 936 | make "test^quux^step5" |
0f4ca9d1 JM |
937 | ``` |
938 | ||
939 | Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to` | |
940 | function cannot be tail call optimized because it does something after | |
941 | the recursive call (`sum-to` calls itself and then does the addition). | |
942 | Lispers say that the `sum-to` is not in tail position. The `sum2` | |
943 | function however, calls itself from tail position. In other words, the | |
944 | recursive call to `sum2` is the last action that `sum2` does. Calling | |
945 | `sum-to` with a large value will cause a stack overflow exception in | |
946 | most target languages (some have super-special tricks they use to | |
947 | avoid stack overflows). | |
948 | ||
949 | Congratulations, your mal implementation already has a feature (TCO) | |
950 | that most mainstream languages lack. | |
951 | ||
952 | ||
953 | <a name="step6"></a> | |
954 | ||
396d869e | 955 | ### Step 6: Files, Mutation, and Evil |
0f4ca9d1 JM |
956 | |
957 | ![step6_file architecture](step6_file.png) | |
958 | ||
959 | In step 5 you added tail call optimization. In this step you will add | |
960 | some string and file operations and give your implementation a touch | |
961 | of evil ... er, eval. And as long as your language supports function | |
962 | closures, this step will be quite simple. However, to complete this | |
963 | step, you must implement string type support, so if you have been | |
964 | holding off on that you will need to go back and do so. | |
965 | ||
966 | Compare the pseudocode for step 5 and step 6 to get a basic idea of | |
967 | the changes that will be made during this step: | |
968 | ``` | |
969 | diff -urp ../process/step5_tco.txt ../process/step6_file.txt | |
970 | ``` | |
971 | ||
ffd31966 JM |
972 | * Copy `step5_tco.qx` to `step6_file.qx`. |
973 | ||
0f4ca9d1 JM |
974 | * Add two new string functions to the core namespaces: |
975 | * `read-string`: this function just exposes the `read_str` function | |
976 | from the reader. If your mal string type is not the same as your | |
977 | target language (e.g. statically typed language) then your | |
978 | `read-string` function will need to unbox (extract) the raw string | |
979 | from the mal string type in order to call `read_str`. | |
980 | * `slurp`: this function takes a file name (string) and returns the | |
981 | contents of the file as a string. Once again, if your mal string | |
982 | type wraps a raw target language string, then you will need to | |
983 | unmarshall (extract) the string parameter to get the raw file name | |
984 | string and marshall (wrap) the result back to a mal string type. | |
985 | ||
6fef8e58 JM |
986 | * In your main program, add a new symbol "eval" to your REPL |
987 | environment. The value of this new entry is a function that takes | |
988 | a single argument `ast`. The closure calls the your `EVAL` function | |
989 | using the `ast` as the first argument and the REPL environment | |
990 | (closed over from outside) as the second argument. The result of | |
991 | the `EVAL` call is returned. This simple but powerful addition | |
992 | allows your program to treat mal data as a mal program. For example, | |
cfdf00cc | 993 | you can now to this: |
6fef8e58 JM |
994 | ``` |
995 | (def! mal-prog (list + 1 2)) | |
996 | (eval mal-prog) | |
997 | ``` | |
0f4ca9d1 JM |
998 | |
999 | * Define a `load-file` function using mal itself. In your main | |
1000 | program call the `rep` function with this string: | |
e6d41de4 | 1001 | "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \"\nnil)\")))))". |
0f4ca9d1 JM |
1002 | |
1003 | Try out `load-file`: | |
1004 | * `(load-file "../tests/incA.mal")` -> `9` | |
1005 | * `(inc4 3)` -> `7` | |
1006 | ||
1007 | The `load-file` function does the following: | |
1008 | * Call `slurp` to read in a file by name. Surround the contents with | |
1009 | "(do ...)" so that the whole file will be treated as a single | |
e6d41de4 NB |
1010 | program AST (abstract syntax tree). Add a new line in case the files |
1011 | ends with a comment. The `nil` ensures a short and predictable result, | |
1012 | instead of what happens to be the last function defined in the loaded file. | |
0f4ca9d1 JM |
1013 | * Call `read-string` on the string returned from `slurp`. This uses |
1014 | the reader to read/convert the file contents into mal data/AST. | |
1015 | * Call `eval` (the one in the REPL environment) on the AST returned | |
1016 | from `read-string` to "run" it. | |
1017 | ||
627bd6f7 DM |
1018 | Besides adding file and eval support, we'll add support for the atom data type |
1019 | in this step. An atom is the Mal way to represent *state*; it is | |
1020 | heavily inspired by [Clojure's atoms](http://clojure.org/state). An atom holds | |
1021 | a reference to a single Mal value of any type; it supports reading that Mal value | |
1022 | and *modifying* the reference to point to another Mal value. Note that this is | |
1023 | the only Mal data type that is mutable (but the Mal values it refers to are | |
1024 | still immutable; immutability is explained in greater detail in step 7). | |
a85b8454 | 1025 | You'll need to add 5 functions to the core namespace to support atoms: |
627bd6f7 DM |
1026 | |
1027 | * `atom`: Takes a Mal value and returns a new atom which points to that Mal value. | |
1028 | * `atom?`: Takes an argument and returns `true` if the argument is an atom. | |
1029 | * `deref`: Takes an atom argument and returns the Mal value referenced by this atom. | |
1030 | * `reset!`: Takes an atom and a Mal value; the atom is modified to refer to | |
1031 | the given Mal value. The Mal value is returned. | |
1032 | * `swap!`: Takes an atom, a function, and zero or more function arguments. The | |
1c2eab90 | 1033 | atom's value is modified to the result of applying the function with the atom's |
627bd6f7 DM |
1034 | value as the first argument and the optionally given function arguments as |
1035 | the rest of the arguments. The new atom's value is returned. (Side note: Mal is | |
1036 | single-threaded, but in concurrent languages like Clojure, `swap!` promises | |
1037 | atomic update: `(swap! myatom (fn* [x] (+ 1 x)))` will always increase the | |
1038 | `myatom` counter by one and will not suffer from missing updates when the | |
1039 | atom is updated from multiple threads.) | |
1040 | ||
1041 | Optionally, you can add a reader macro `@` which will serve as a short form for | |
1042 | `deref`, so that `@a` is equivalent to `(deref a)`. In order to do that, modify | |
1043 | the conditional in reader `read_form` function and add a case which deals with | |
1044 | the `@` token: if the token is `@` (at sign) then return a new list that | |
1045 | contains the symbol `deref` and the result of reading the next form | |
1046 | (`read_form`). | |
1047 | ||
0f4ca9d1 | 1048 | Now go to the top level, run the step 6 tests. The optional tests will |
627bd6f7 DM |
1049 | need support from the reader for comments, vectors, hash-maps and the `@` |
1050 | reader macro: | |
0f4ca9d1 | 1051 | ``` |
e5737b08 | 1052 | make "test^quux^step6" |
0f4ca9d1 JM |
1053 | ``` |
1054 | ||
1055 | Congratulations, you now have a full-fledged scripting language that | |
6fef8e58 JM |
1056 | can run other mal programs. The `slurp` function loads a file as |
1057 | a string, the `read-string` function calls the mal reader to turn that | |
10a76646 | 1058 | string into data, and the `eval` function takes data and evaluates it |
6fef8e58 JM |
1059 | as a normal mal program. However, it is important to note that the |
1060 | `eval` function is not just for running external programs. Because mal | |
1061 | programs are regular mal data structures, you can dynamically generate | |
1062 | or manipulate those data structures before calling `eval` on them. | |
a85b8454 | 1063 | This isomorphism (same shape) between data and programs is known as |
6fef8e58 JM |
1064 | "homoiconicity". Lisp languages are homoiconic and this property |
1065 | distinguishes them from most other programming languages. | |
1066 | ||
1067 | You mal implementation is quite powerful already but the set of | |
1068 | functions that are available (from `core.qx`) is fairly limited. The | |
1069 | bulk of the functions you will add are described in step 9 and step A, | |
1070 | but you will begin to flesh them out over the next few steps to | |
1071 | support quoting (step 7) and macros (step 8). | |
0f4ca9d1 JM |
1072 | |
1073 | ||
45a8b3ca | 1074 | #### Deferrable: |
0f4ca9d1 JM |
1075 | |
1076 | * Add the ability to run another mal program from the command line. | |
1077 | Prior to the REPL loop, check if your mal implementation is called | |
1078 | with command line arguments. If so, treat the first argument as | |
1079 | a filename and use `rep` to call `load-file` on that filename, and | |
1080 | finally exit/terminate execution. | |
1081 | ||
1082 | * Add the rest of the command line arguments to your REPL environment | |
1083 | so that programs that are run with `load-file` have access to their | |
a85b8454 | 1084 | calling environment. Add a new "\*ARGV\*" (symbol) entry to your REPL |
0f4ca9d1 JM |
1085 | environment. The value of this entry should be the rest of the |
1086 | command line arguments as a mal list value. | |
1087 | ||
1088 | ||
1089 | <a name="step7"></a> | |
1090 | ||
1091 | ### Step 7: Quoting | |
1092 | ||
1093 | ![step7_quote architecture](step7_quote.png) | |
1094 | ||
1095 | In step 7 you will add the special forms `quote` and `quasiquote` and | |
1096 | add supporting core functions `cons` and `concat`. The two quote forms | |
1097 | add a powerful abstraction for manipulating mal code itself | |
1098 | (meta-programming). | |
1099 | ||
0f4ca9d1 JM |
1100 | The `quote` special form indicates to the evaluator (`EVAL`) that the |
1101 | parameter should not be evaluated (yet). At first glance, this might | |
10a76646 | 1102 | not seem particularly useful but an example of what this enables is the |
0f4ca9d1 JM |
1103 | ability for a mal program to refer to a symbol itself rather than the |
1104 | value that it evaluates to. Likewise with lists. For example, consider | |
1105 | the following: | |
1106 | ||
1107 | * `(prn abc)`: this will lookup the symbol `abc` in the current | |
1108 | evaluation environment and print it. This will result in error if | |
1109 | `abc` is not defined. | |
1110 | * `(prn (quote abc))`: this will print "abc" (prints the symbol | |
1111 | itself). This will work regardless of whether `abc` is defined in | |
1112 | the current environment. | |
1113 | * `(prn (1 2 3))`: this will result in an error because `1` is not | |
1114 | a function and cannot be applied to the arguments `(2 3)`. | |
1115 | * `(prn (quote (1 2 3)))`: this will print "(1 2 3)". | |
1116 | * `(def! l (quote (1 2 3)))`: list quoting allows us to define lists | |
1117 | directly in the code (list literal). Another way of doing this is | |
1118 | with the list function: `(def! l (list 1 2 3))`. | |
1119 | ||
1120 | The second special quoting form is `quasiquote`. This allows a quoted | |
1121 | list to have internal elements of the list that are temporarily | |
1122 | unquoted (normal evaluation). There are two special forms that only | |
1123 | mean something within a quasiquoted list: `unquote` and | |
1124 | `splice-unquote`. These are perhaps best explained with some examples: | |
1125 | ||
1126 | * `(def! lst (quote (2 3)))` -> `(2 3)` | |
1127 | * `(quasiquote (1 (unquote lst)))` -> `(1 (2 3))` | |
1128 | * `(quasiquote (1 (splice-unquote lst)))` -> `(1 2 3)` | |
1129 | ||
1130 | The `unquote` form turns evaluation back on for its argument and the | |
1131 | result of evaluation is put in place into the quasiquoted list. The | |
1132 | `splice-unquote` also turns evaluation back on for its argument, but | |
1133 | the evaluated value must be a list which is then "spliced" into the | |
1134 | quasiquoted list. The true power of the quasiquote form will be | |
1c2eab90 | 1135 | manifest when it is used together with macros (in the next step). |
0f4ca9d1 | 1136 | |
ffd31966 JM |
1137 | Compare the pseudocode for step 6 and step 7 to get a basic idea of |
1138 | the changes that will be made during this step: | |
1139 | ``` | |
1140 | diff -urp ../process/step6_file.txt ../process/step7_quote.txt | |
1141 | ``` | |
1142 | ||
1143 | * Copy `step6_file.qx` to `step7_quote.qx`. | |
1144 | ||
0f4ca9d1 | 1145 | * Before implementing the quoting forms, you will need to implement |
da39e01b | 1146 | some supporting functions in the core namespace: |
0f4ca9d1 JM |
1147 | * `cons`: this function takes a list as its second |
1148 | parameter and returns a new list that has the first argument | |
1149 | prepended to it. | |
1150 | * `concat`: this functions takes 0 or more lists as | |
1151 | parameters and returns a new list that is a concatenation of all | |
1152 | the list parameters. | |
1153 | ||
1154 | An aside on immutability: note that neither cons or concat mutate | |
1155 | their original list arguments. Any references to them (i.e. other | |
1156 | lists that they may be "contained" in) will still refer to the | |
1157 | original unchanged value. Mal, like Clojure, is a language which uses | |
1158 | immutable data structures. I encourage you to read about the power and | |
1159 | importance of immutability as implemented in Clojure (from which | |
1160 | Mal borrows most of its syntax and feature-set). | |
1161 | ||
1162 | * Add the `quote` special form. This form just returns its argument | |
1163 | (the second list element of `ast`). | |
1164 | ||
1165 | * Add the `quasiquote` special form. First implement a helper function | |
1166 | `is_pair` that returns true if the parameter is a non-empty list. | |
1167 | Then define a `quasiquote` function. This is called from `EVAL` with | |
1168 | the first `ast` argument (second list element) and then `ast` is set | |
1169 | to the result and execution continues at the top of the loop (TCO). | |
1170 | The `quasiquote` function takes a parameter `ast` and has the | |
1171 | following conditional: | |
1172 | 1. if `is_pair` of `ast` is false: return a new list containing: | |
1173 | a symbol named "quote" and `ast`. | |
1174 | 2. else if the first element of `ast` is a symbol named "unquote": | |
1175 | return the second element of `ast`. | |
1c2eab90 | 1176 | 3. if `is_pair` of the first element of `ast` is true and the first |
0f4ca9d1 JM |
1177 | element of first element of `ast` (`ast[0][0]`) is a symbol named |
1178 | "splice-unquote": return a new list containing: a symbol named | |
1179 | "concat", the second element of first element of `ast` | |
1180 | (`ast[0][1]`), and the result of calling `quasiquote` with the | |
1181 | second through last element of `ast`. | |
1182 | 4. otherwise: return a new list containing: a symbol named "cons", the | |
1183 | result of calling `quasiquote` on first element of `ast` | |
1c2eab90 | 1184 | (`ast[0]`), and the result of calling `quasiquote` with the second |
0f4ca9d1 JM |
1185 | through last element of `ast`. |
1186 | ||
1187 | ||
1188 | Now go to the top level, run the step 7 tests: | |
1189 | ``` | |
e5737b08 | 1190 | make "test^quux^step7" |
0f4ca9d1 JM |
1191 | ``` |
1192 | ||
1193 | Quoting is one of the more mundane functions available in mal, but do | |
1194 | not let that discourage you. Your mal implementation is almost | |
da39e01b | 1195 | complete, and quoting sets the stage for the next very exciting step: |
0f4ca9d1 JM |
1196 | macros. |
1197 | ||
1198 | ||
45a8b3ca | 1199 | #### Deferrable |
0f4ca9d1 JM |
1200 | |
1201 | * The full names for the quoting forms are fairly verbose. Most Lisp | |
1202 | languages have a short-hand syntax and Mal is no exception. These | |
1203 | short-hand syntaxes are known as reader macros because they allow us | |
1204 | to manipulate mal code during the reader phase. Macros that run | |
1205 | during the eval phase are just called "macros" and are described in | |
1206 | the next section. Expand the conditional with reader `read_form` | |
1207 | function to add the following four cases: | |
1208 | * token is "'" (single quote): return a new list that contains the | |
1209 | symbol "quote" and the result of reading the next form | |
1210 | (`read_form`). | |
8f41f75a | 1211 | * token is "\`" (back-tick): return a new list that contains the |
0f4ca9d1 JM |
1212 | symbol "quasiquote" and the result of reading the next form |
1213 | (`read_form`). | |
1214 | * token is "~" (tilde): return a new list that contains the | |
1215 | symbol "unquote" and the result of reading the next form | |
1216 | (`read_form`). | |
1217 | * token is "~@" (tilde + at sign): return a new list that contains | |
1218 | the symbol "splice-unquote" and the result of reading the next | |
1219 | form (`read_form`). | |
1220 | ||
1221 | * Add support for quoting of vectors. The `is_pair` function should | |
1222 | return true if the argument is a non-empty list or vector. `cons` | |
1223 | should also accept a vector as the second argument. The return value | |
1224 | is a list regardless. `concat` should support concatenation of | |
1225 | lists, vectors, or a mix or both. The result is always a list. | |
1226 | ||
1227 | ||
1228 | <a name="step8"></a> | |
1229 | ||
1230 | ### Step 8: Macros | |
1231 | ||
1232 | ![step8_macros architecture](step8_macros.png) | |
1233 | ||
a85b8454 | 1234 | Your mal implementation is now ready for one of the most lispy and |
0f4ca9d1 JM |
1235 | exciting of all programming concepts: macros. In the previous step, |
1236 | quoting enabled some simple manipulation data structures and therefore | |
1237 | manipulation of mal code (because the `eval` function from step | |
1238 | 6 turns mal data into code). In this step you will be able to mark mal | |
1239 | functions as macros which can manipulate mal code before it is | |
1240 | evaluated. In other words, macros are user-defined special forms. Or | |
1241 | to look at it another way, macros allow mal programs to redefine | |
1242 | the mal language itself. | |
1243 | ||
1244 | Compare the pseudocode for step 7 and step 8 to get a basic idea of | |
1245 | the changes that will be made during this step: | |
1246 | ``` | |
1247 | diff -urp ../process/step7_quote.txt ../process/step8_macros.txt | |
1248 | ``` | |
1249 | ||
ffd31966 JM |
1250 | * Copy `step7_quote.qx` to `step8_macros.qx`. |
1251 | ||
1252 | ||
0f4ca9d1 JM |
1253 | You might think that the infinite power of macros would require some |
1254 | sort of complex mechanism, but the implementation is actually fairly | |
1255 | simple. | |
1256 | ||
1257 | * Add a new attribute `is_macro` to mal function types. This should | |
1258 | default to false. | |
1259 | ||
1260 | * Add a new special form `defmacro!`. This is very similar to the | |
1261 | `def!` form, but before the evaluated value (mal function) is set in | |
1262 | the environment, the `is_macro` attribute should be set to true. | |
1263 | ||
1264 | * Add a `is_macro_call` function: This function takes arguments `ast` | |
1265 | and `env`. It returns true if `ast` is a list that contains a symbol | |
1266 | as the first element and that symbol refers to a function in the | |
1267 | `env` environment and that function has the `is_macro` attribute set | |
1268 | to true. Otherwise, it returns false. | |
1269 | ||
1270 | * Add a `macroexpand` function: This function takes arguments `ast` | |
1271 | and `env`. It calls `is_macro_call` with `ast` and `env` and loops | |
1272 | while that condition is true. Inside the loop, the first element of | |
1273 | the `ast` list (a symbol), is looked up in the environment to get | |
1274 | the macro function. This macro function is then called/applied with | |
1275 | the rest of the `ast` elements (2nd through the last) as arguments. | |
1276 | The return value of the macro call becomes the new value of `ast`. | |
1277 | When the loop completes because `ast` no longer represents a macro | |
1278 | call, the current value of `ast` is returned. | |
1279 | ||
1280 | * In the evaluator (`EVAL`) before the special forms switch (apply | |
1281 | section), perform macro expansion by calling the `macroexpand` | |
1282 | function with the current value of `ast` and `env`. Set `ast` to the | |
1283 | result of that call. If the new value of `ast` is no longer a list | |
9b4cfe03 JM |
1284 | after macro expansion, then return the result of calling `eval_ast` |
1285 | on it, otherwise continue with the rest of the apply section | |
1286 | (special forms switch). | |
0f4ca9d1 JM |
1287 | |
1288 | * Add a new special form condition for `macroexpand`. Call the | |
1289 | `macroexpand` function using the first `ast` argument (second list | |
1290 | element) and `env`. Return the result. This special form allows | |
1291 | a mal program to do explicit macro expansion without applying the | |
1292 | result (which can be useful for debugging macro expansion). | |
1293 | ||
1294 | Now go to the top level, run the step 8 tests: | |
1295 | ``` | |
e5737b08 | 1296 | make "test^quux^step8" |
0f4ca9d1 JM |
1297 | ``` |
1298 | ||
8a98ef9a JM |
1299 | There is a reasonably good chance that the macro tests will not pass |
1300 | the first time. Although the implementation of macros is fairly | |
1301 | simple, debugging runtime bugs with macros can be fairly tricky. If | |
1302 | you do run into subtle problems that are difficult to solve, let me | |
1303 | recommend a couple of approaches: | |
1304 | ||
1305 | * Use the macroexpand special form to eliminate one of the layers of | |
1306 | indirection (to expand but skip evaluate). This will often reveal | |
1307 | the source of the issue. | |
1308 | * Add a debug print statement to the top of your main `eval` function | |
1309 | (inside the TCO loop) to print the current value of `ast` (hint use | |
1310 | `pr_str` to get easier to debug output). Pull up the step8 | |
1311 | implementation from another language and uncomment its `eval` | |
1312 | function (yes, I give you permission to violate the rule this once). | |
1313 | Run the two side-by-side. The first difference is likely to point to | |
1314 | the bug. | |
1315 | ||
1316 | Congratulations! You now have a Lisp interpreter with a super power | |
1317 | that most non-Lisp languages can only dream of (I have it on good | |
1318 | authority that languages dream when you are not using them). If you | |
1319 | are not already familiar with Lisp macros, I suggest the following | |
a85b8454 | 1320 | exercise: write a recursive macro that handles postfixed mal code |
8a98ef9a JM |
1321 | (with the function as the last parameter instead of the first). Or |
1322 | not. I have not actually done so myself, but I have heard it is an | |
a85b8454 | 1323 | interesting exercise. |
8a98ef9a JM |
1324 | |
1325 | In the next step you will add try/catch style exception handling to | |
1326 | your implementation in addition to some new core functions. After | |
1327 | step9 you will be very close to having a fully self-hosting mal | |
1328 | implementation. Let us continue! | |
1329 | ||
0f4ca9d1 | 1330 | |
dc791440 | 1331 | #### Deferrable |
0f4ca9d1 JM |
1332 | |
1333 | * Add the following new core functions which are frequently used in | |
1334 | macro functions: | |
1335 | * `nth`: this function takes a list (or vector) and a number (index) | |
1336 | as arguments, returns the element of the list at the given index. | |
1337 | If the index is out of range, this function raises an exception. | |
1338 | * `first`: this function takes a list (or vector) as its argument | |
1339 | and return the first element. If the list (or vector) is empty or | |
1340 | is `nil` then `nil` is returned. | |
1341 | * `rest`: this function takes a list (or vector) as its argument and | |
66b7ea46 GL |
1342 | returns a new list containing all the elements except the first. If |
1343 | the list (or vector) is empty or is `nil` then `()` (empty list) | |
1344 | is returned. | |
0f4ca9d1 | 1345 | |
26ced15b NB |
1346 | * In the main program, call the `rep` function with the following |
1347 | string argument to define a new control structure. | |
1348 | ``` | |
1349 | "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))" | |
1350 | ``` | |
dd4020b9 JM |
1351 | * Note that `cond` calls the `throw` function when `cond` is |
1352 | called with an odd number of args. The `throw` function is | |
1353 | implemented in the next step, but it will still serve it's | |
1354 | purpose here by causing an undefined symbol error. | |
0f4ca9d1 JM |
1355 | |
1356 | ||
ffd31966 JM |
1357 | <a name="step9"></a> |
1358 | ||
1359 | ### Step 9: Try | |
1360 | ||
1361 | ![step9_try architecture](step9_try.png) | |
1362 | ||
2345a3da JM |
1363 | In this step you will implement the final mal special form for |
1364 | error/exception handling: `try*/catch*`. You will also add several core | |
10a76646 AS |
1365 | functions to your implementation. In particular, you will enhance the |
1366 | functional programming pedigree of your implementation by adding the | |
2345a3da JM |
1367 | `apply` and `map` core functions. |
1368 | ||
ffd31966 | 1369 | Compare the pseudocode for step 8 and step 9 to get a basic idea of |
dbac60df | 1370 | the changes that will be made during this step: |
ffd31966 JM |
1371 | ``` |
1372 | diff -urp ../process/step8_macros.txt ../process/step9_try.txt | |
1373 | ``` | |
1374 | ||
1375 | * Copy `step8_macros.qx` to `step9_try.qx`. | |
1376 | ||
2345a3da JM |
1377 | * Add the `try*/catch*` special form to the EVAL function. The |
1378 | try catch form looks like this: `(try* A (catch* B C))`. The form | |
1379 | `A` is evaluated, if it throws an exception, then form `C` is | |
1c2eab90 | 1380 | evaluated with a new environment that binds the symbol `B` to the |
2345a3da JM |
1381 | value of the exception that was thrown. |
1382 | * If your target language has built-in try/catch style exception | |
1383 | handling then you are already 90% of the way done. Add a | |
1c2eab90 | 1384 | (native language) try/catch block that evaluates `A` within |
2345a3da JM |
1385 | the try block and catches all exceptions. If an exception is |
1386 | caught, then translate it to a mal type/value. For native | |
1387 | exceptions this is either the message string or a mal hash-map | |
1388 | that contains the message string and other attributes of the | |
1c2eab90 | 1389 | exception. When a regular mal type/value is used as an |
2345a3da JM |
1390 | exception, you will probably need to store it within a native |
1391 | exception type in order to be able to convey/transport it using | |
1392 | the native try/catch mechanism. Then you will extract the mal | |
1393 | type/value from the native exception. Create a new mal environment | |
1c2eab90 | 1394 | that binds `B` to the value of the exception. Finally, evaluate `C` |
2345a3da JM |
1395 | using that new environment. |
1396 | * If your target language does not have built-in try/catch style | |
1397 | exception handling then you have some extra work to do. One of the | |
1398 | most straightforward approaches is to create a a global error | |
1399 | variable that stores the thrown mal type/value. The complication | |
1400 | is that there are a bunch of places where you must check to see if | |
1401 | the global error state is set and return without proceeding. The | |
1402 | rule of thumb is that this check should happen at the top of your | |
1403 | EVAL function and also right after any call to EVAL (and after any | |
1404 | function call that might happen to call EVAL further down the | |
1405 | chain). Yes, it is ugly, but you were warned in the section on | |
1406 | picking a language. | |
1407 | ||
cfdf00cc | 1408 | * Add the `throw` core function. |
2345a3da JM |
1409 | * If your language supports try/catch style exception handling, then |
1410 | this function takes a mal type/value and throws/raises it as an | |
1411 | exception. In order to do this, you may need to create a custom | |
1412 | exception object that wraps a mal value/type. | |
1413 | * If your language does not support try/catch style exception | |
1414 | handling, then set the global error state to the mal type/value. | |
1415 | ||
1416 | * Add the `apply` and `map` core functions. In step 5, if you did not | |
1417 | add the original function (`fn`) to the structure returned from | |
bd62ff74 JM |
1418 | `fn*`, the you will need to do so now. |
1419 | * `apply`: takes at least two arguments. The first argument is | |
1420 | a function and the last argument is list (or vector). The | |
a85b8454 | 1421 | arguments between the function and the last argument (if there are |
bd62ff74 JM |
1422 | any) are concatenated with the final argument to create the |
1423 | arguments that are used to call the function. The apply | |
1424 | function allows a function to be called with arguments that are | |
1425 | contained in a list (or vector). In other words, `(apply F A B [C | |
1426 | D])` is equivalent to `(F A B C D)`. | |
1427 | * `map`: takes a function and a list (or vector) and evaluates the | |
1428 | function against every element of the list (or vector) one at | |
1429 | a time and returns the results as a list. | |
2345a3da JM |
1430 | |
1431 | * Add some type predicates core functions. In Lisp, predicates are | |
1432 | functions that return true/false (or true value/nil) and typically | |
bd62ff74 JM |
1433 | end in "?" or "p". |
1434 | * `nil?`: takes a single argument and returns true (mal true value) | |
1435 | if the argument is nil (mal nil value). | |
1436 | * `true?`: takes a single argument and returns true (mal true value) | |
1437 | if the argument is a true value (mal true value). | |
1438 | * `false?`: takes a single argument and returns true (mal true | |
1439 | value) if the argument is a false value (mal false value). | |
1440 | * `symbol?`: takes a single argument and returns true (mal true | |
1441 | value) if the argument is a symbol (mal symbol value). | |
1442 | ||
1443 | Now go to the top level, run the step 9 tests: | |
1444 | ``` | |
e5737b08 | 1445 | make "test^quux^step9" |
bd62ff74 JM |
1446 | ``` |
1447 | ||
1448 | Your mal implementation is now essentially a fully featured Lisp | |
1449 | interpreter. But if you stop now you will miss one of the most | |
1450 | satisfying and enlightening aspects of creating a mal implementation: | |
1451 | self-hosting. | |
2345a3da | 1452 | |
dc791440 | 1453 | #### Deferrable |
2345a3da | 1454 | |
e37d9b49 JM |
1455 | * Add the following new core functions: |
1456 | * `symbol`: takes a string and returns a new symbol with the string | |
1457 | as its name. | |
1458 | * `keyword`: takes a string and returns a keyword with the same name | |
1459 | (usually just be prepending the special keyword | |
1460 | unicode symbol). This function should also detect if the argument | |
1461 | is already a keyword and just return it. | |
1462 | * `keyword?`: takes a single argument and returns true (mal true | |
1463 | value) if the argument is a keyword, otherwise returns false (mal | |
1464 | false value). | |
1465 | * `vector`: takes a variable number of arguments and returns | |
1466 | a vector containing those arguments. | |
1467 | * `vector?`: takes a single argument and returns true (mal true | |
1468 | value) if the argument is a vector, otherwise returns false (mal | |
1469 | false value). | |
da39e01b | 1470 | * `sequential?`: takes a single argument and returns true (mal true |
ea0660cb BH |
1471 | value) if it is a list or a vector, otherwise returns false (mal |
1472 | false value). | |
e37d9b49 JM |
1473 | * `hash-map`: takes a variable but even number of arguments and |
1474 | returns a new mal hash-map value with keys from the odd arguments | |
1475 | and values from the even arguments respectively. This is basically | |
1476 | the functional form of the `{}` reader literal syntax. | |
1477 | * `map?`: takes a single argument and returns true (mal true | |
1478 | value) if the argument is a hash-map, otherwise returns false (mal | |
1479 | false value). | |
1480 | * `assoc`: takes a hash-map as the first argument and the remaining | |
1481 | arguments are odd/even key/value pairs to "associate" (merge) into | |
1482 | the hash-map. Note that the original hash-map is unchanged | |
1483 | (remember, mal values are immutable), and a new hash-map | |
1484 | containing the old hash-maps key/values plus the merged key/value | |
1485 | arguments is returned. | |
1486 | * `dissoc`: takes a hash-map and a list of keys to remove from the | |
1487 | hash-map. Again, note that the original hash-map is unchanged and | |
1488 | a new hash-map with the keys removed is returned. Key arguments | |
1489 | that do not exist in the hash-map are ignored. | |
1490 | * `get`: takes a hash-map and a key and returns the value of looking | |
1491 | up that key in the hash-map. If the key is not found in the | |
1492 | hash-map then nil is returned. | |
1493 | * `contains?`: takes a hash-map and a key and returns true (mal true | |
1494 | value) if the key exists in the hash-map and false (mal false | |
1495 | value) otherwise. | |
1496 | * `keys`: takes a hash-map and returns a list (mal list value) of | |
1497 | all the keys in the hash-map. | |
1498 | * `vals`: takes a hash-map and returns a list (mal list value) of | |
1499 | all the values in the hash-map. | |
ffd31966 JM |
1500 | |
1501 | ||
8569b2af | 1502 | <a name="stepA"></a> |
ffd31966 | 1503 | |
396d869e | 1504 | ### Step A: Metadata, Self-hosting and Interop |
ffd31966 | 1505 | |
90f618cb | 1506 | ![stepA_mal architecture](stepA_mal.png) |
ffd31966 | 1507 | |
bd62ff74 JM |
1508 | You have reached the final step of your mal implementation. This step |
1509 | is kind of a catchall for things that did not fit into other steps. | |
1510 | But most importantly, the changes you make in this step will unlock | |
1511 | the magical power known as "self-hosting". You might have noticed | |
1512 | that one of the languages that mal is implemented in is "mal". Any mal | |
1513 | implementation that is complete enough can run the mal implementation | |
1514 | of mal. You might need to pull out your hammock and ponder this for | |
1515 | a while if you have never built a compiler or interpreter before. Look | |
1516 | at the step source files for the mal implementation of mal (it is not | |
1517 | cheating now that you have reached step A). | |
1518 | ||
1519 | If you deferred the implementation of keywords, vectors and hash-maps, | |
1520 | now is the time to go back and implement them if you want your | |
1521 | implementation to self-host. | |
1522 | ||
ffd31966 JM |
1523 | Compare the pseudocode for step 9 and step A to get a basic idea of |
1524 | the changes that will be made during this step: | |
1525 | ``` | |
dbac60df | 1526 | diff -urp ../process/step9_try.txt ../process/stepA_mal.txt |
ffd31966 JM |
1527 | ``` |
1528 | ||
90f618cb | 1529 | * Copy `step9_try.qx` to `stepA_mal.qx`. |
ffd31966 | 1530 | |
396d869e JM |
1531 | * Add the `readline` core function. This functions takes a |
1532 | string that is used to prompt the user for input. The line of text | |
1533 | entered by the user is returned as a string. If the user sends an | |
1534 | end-of-file (usually Ctrl-D), then nil is returned. | |
1535 | ||
33e37b68 JM |
1536 | * Add a new "\*host-language\*" (symbol) entry to your REPL |
1537 | environment. The value of this entry should be a mal string | |
0198b7a2 | 1538 | containing the name of the current implementation. |
33e37b68 JM |
1539 | |
1540 | * When the REPL starts up (as opposed to when it is called with | |
1541 | a script and/or arguments), call the `rep` function with this string | |
1542 | to print a startup header: | |
16ca60da | 1543 | "(println (str \"Mal [\" \*host-language\* \"]\"))". |
33e37b68 | 1544 | |
e7d1bb12 | 1545 | * Ensure that the REPL environment contains definitions for `time-ms`, |
28b63c0c | 1546 | `meta`, `with-meta`, `fn?` |
e7d1bb12 BH |
1547 | `string?`, `number?`, `seq`, and `conj`. It doesn't really matter |
1548 | what they do at this stage: they just need to be defined. Making | |
1549 | them functions that raise a "not implemented" exception would be | |
1550 | fine. | |
bd62ff74 | 1551 | |
8f41f75a | 1552 | Now go to the top level, run the step A tests: |
bd62ff74 | 1553 | ``` |
e5737b08 | 1554 | make "test^quux^stepA" |
bd62ff74 JM |
1555 | ``` |
1556 | ||
1557 | Once you have passed all the non-optional step A tests, it is time to | |
1558 | try self-hosting. Run your step A implementation as normal, but use | |
1559 | the file argument mode you added in step 6 to run a each of the step | |
1560 | from the mal implementation: | |
1561 | ``` | |
1562 | ./stepA_mal.qx ../mal/step1_read_print.mal | |
1563 | ./stepA_mal.qx ../mal/step2_eval.mal | |
1564 | ... | |
1565 | ./stepA_mal.qx ../mal/step9_try.mal | |
1566 | ./stepA_mal.qx ../mal/stepA_mal.mal | |
1567 | ``` | |
1568 | ||
10a76646 | 1569 | There is a very good chance that you will encounter an error at some |
bd62ff74 JM |
1570 | point while trying to run the mal in mal implementation steps above. |
1571 | Debugging failures that happen while self-hosting is MUCH more | |
1572 | difficult and mind bending. One of the best approaches I have | |
cfdf00cc | 1573 | personally found is to add prn statements to the mal implementation |
bd62ff74 JM |
1574 | step (not your own implementation of mal) that is causing problems. |
1575 | ||
1576 | Another approach I have frequently used is to pull out the code from | |
1577 | the mal implementation that is causing the problem and simplify it | |
1578 | step by step until you have a simple piece of mal code that still | |
1579 | reproduces the problem. Once the reproducer is simple enough you will | |
1580 | probably know where in your own implementation that problem is likely | |
1581 | to be. Please add your simple reproducer as a test case so that future | |
1582 | implementers will fix similar issues in their code before they get to | |
1583 | self-hosting when it is much more difficult to track down and fix. | |
1584 | ||
1585 | Once you can manually run all the self-hosted steps, it is time to run | |
1586 | all the tests in self-hosted mode: | |
1587 | ``` | |
e5737b08 | 1588 | make MAL_IMPL=quux "test^mal" |
bd62ff74 JM |
1589 | ``` |
1590 | ||
1591 | When you run into problems (which you almost certainly will), use the | |
1592 | same process described above to debug them. | |
1593 | ||
1594 | Congratulations!!! When all the tests pass, you should pause for | |
1595 | a moment and consider what you have accomplished. You have implemented | |
1596 | a Lisp interpreter that is powerful and complete enough to run a large | |
1597 | mal program which is itself an implementation of the mal language. You | |
1598 | might even be asking if you can continue the "inception" by using your | |
1599 | implementation to run a mal implementation which itself runs the mal | |
1600 | implementation. | |
1601 | ||
4881701a | 1602 | |
dc791440 | 1603 | #### Optional additions |
bd62ff74 | 1604 | |
28b63c0c NB |
1605 | * Add meta-data support to composite data types (lists, vectors |
1606 | and hash-maps), and to functions (native or not), by adding a new | |
1607 | metadata attribute that refers to another mal value/type | |
1608 | (nil by default). Add the following metadata related core functions | |
1609 | (and remove any stub versions): | |
1610 | * `meta`: this takes a single mal function argument and returns the | |
1611 | value of the metadata attribute. | |
1612 | * `with-meta`: this function takes two arguments. The first argument | |
1613 | is a mal function and the second argument is another mal | |
1614 | value/type to set as metadata. A copy of the mal function is | |
1615 | returned that has its `meta` attribute set to the second argument. | |
1616 | Note that it is important that the environment and macro attribute | |
1617 | of mal function are retained when it is copied. | |
1618 | * Add a reader-macro that expands the token "^" to | |
1619 | return a new list that contains the symbol "with-meta" and the | |
1620 | result of reading the next next form (2nd argument) (`read_form`) and the | |
1621 | next form (1st argument) in that order | |
1622 | (metadata comes first with the ^ macro and the function second). | |
1623 | * If you implemented as `defmacro!` to mutate an existing function | |
1624 | without copying it, you can now use the function copying mechanism | |
1625 | used for metadata to make functions immutable even in the | |
1626 | defmacro! case... | |
1627 | ||
21af40e4 | 1628 | * Add the following new core functions (and remove any stub versions): |
50756494 | 1629 | * `time-ms`: takes no arguments and returns the number of |
a85b8454 | 1630 | milliseconds since epoch (00:00:00 UTC January 1, 1970), or, if |
50756494 DM |
1631 | not possible, since another point in time (`time-ms` is usually |
1632 | used relatively to measure time durations). After `time-ms` is | |
dd4020b9 JM |
1633 | implemented, you can run the performance micro-benchmarks by |
1634 | running `make perf^quux`. | |
50756494 DM |
1635 | * `conj`: takes a collection and one or more elements as arguments |
1636 | and returns a new collection which includes the original | |
1637 | collection and the new elements. If the collection is a list, a | |
1638 | new list is returned with the elements inserted at the start of | |
1639 | the given list in opposite order; if the collection is a vector, a | |
1640 | new vector is returned with the elements added to the end of the | |
1641 | given vector. | |
396d869e | 1642 | * `string?`: returns true if the parameter is a string. |
59436f1a DM |
1643 | * `number?`: returns true if the parameter is a number. |
1644 | * `fn?`: returns true if the parameter is a function (internal or | |
1645 | user-defined). | |
1646 | * `macro?`: returns true if the parameter is a macro. | |
396d869e JM |
1647 | * `seq`: takes a list, vector, string, or nil. If an empty list, |
1648 | empty vector, or empty string ("") is passed in then nil is | |
1649 | returned. Otherwise, a list is returned unchanged, a vector is | |
1650 | converted into a list, and a string is converted to a list that | |
1651 | containing the original string split into single character | |
1652 | strings. | |
6440cc50 DM |
1653 | * For interop with the target language, add this core function: |
1654 | * `quux-eval`: takes a string, evaluates it in the target language, | |
1655 | and returns the result converted to the relevant Mal type. You | |
1656 | may also add other interop functions as you see fit; Clojure, for | |
1657 | example, has a function called `.` which allows calling Java | |
1658 | methods. If the target language is a static language, consider | |
1659 | using FFI or some language-specific reflection mechanism, if | |
1660 | available. The tests for `quux-eval` and any other interop | |
1661 | function should be added in `quux/tests/stepA_mal.mal` (see the | |
1662 | [tests for `lua-eval`](../lua/tests/stepA_mal.mal) as an example). | |
ffd31966 | 1663 | |
dd4020b9 JM |
1664 | ### Next Steps |
1665 | ||
1666 | * Join the #mal IRC channel. It's fairly quiet but there are bursts of | |
1667 | interesting conversation related to mal, Lisps, esoteric programming | |
1668 | languages, etc. | |
1669 | * If you have created an implementation for a new target language (or | |
1670 | a unique and interesting variant of an existing implementation), | |
1671 | consider sending a pull request to add it into the main mal | |
5d52baf1 TM |
1672 | repository. The [FAQ](../docs/FAQ.md#will-you-add-my-new-implementation) |
1673 | describes general requirements for getting an implementation merged | |
1674 | into the main repository. | |
dd4020b9 JM |
1675 | * Take your interpreter implementation and have it emit source code in |
1676 | the target language rather than immediately evaluating it. In other | |
1677 | words, create a compiler. | |
1678 | * Pick a new target language and implement mal in it. Pick a language | |
1679 | that is very different from any that you already know. | |
1680 | * Use your mal implementation to implement a real world project. Many | |
1681 | of these will force you to address interop. Some ideas: | |
1682 | * Web server (with mal as CGI language for extra points) | |
1683 | * An IRC/Slack chat bot | |
1684 | * An editor (GUI or curses) with mal as a scripting/extension | |
1685 | language. | |
1686 | * An AI player for a game like Chess or Go. | |
1687 | * Implement a feature in your mal implementation that is not covered | |
1688 | by this guide. Some ideas: | |
1689 | * Namespaces | |
1690 | * Multi-threading support | |
1691 | * Errors with line numbers and/or stack traces. | |
1692 | * Lazy sequences | |
1693 | * Clojure-style protocols | |
1694 | * Full call/cc (call-with-current-continuation) support | |
1695 | * Explicit TCO (i.e. `recur`) with tail-position error checking |