Commit | Line | Data |
---|---|---|
0f4ca9d1 JM |
1 | # The Make-A-Lisp Process |
2 | ||
3 | So you want to write a Lisp interpreter? Welcome! | |
4 | ||
5 | The goal of the Make-A-Lisp project is to make it easy to write your | |
6 | own Lisp interpreter without sacrificing those many "Aha!" moments | |
7 | that come from ascending the McCarthy mountain. When you reach the peak | |
8 | of this particular mountain, you will have an interpreter for the mal | |
9 | Lisp language that is powerful enough to be self-hosting, meaning it | |
10 | will be able to run a mal interpreter written in mal itself. | |
11 | ||
12 | So jump right in (er ... start the climb)! | |
13 | ||
14 | ## Pick a language | |
15 | ||
16 | You might already have a language in mind that you want to use. | |
17 | Technically speaking, mal can be implemented in any sufficiently | |
18 | complete programming (i.e. Turing complete), however, there are a few | |
19 | language features that can make the task MUCH easier. Here are some of | |
20 | them in rough order of importance: | |
21 | ||
22 | * A sequential compound data structure (e.g. arrays, lists, | |
23 | vectors, etc) | |
24 | * An associative compound data structure (e.g. a dictionary, | |
25 | hash-map, associative array, etc) | |
26 | * Function references (first class functions, function pointers, | |
27 | etc) | |
28 | * Real exception handling (try/catch, raise, throw, etc) | |
29 | * Variable argument functions (variadic, var args, splats, apply, etc) | |
30 | * Function closures | |
31 | * PCRE regular expressions | |
32 | ||
33 | In addition, the following will make your task especially easy: | |
34 | ||
bf518367 JM |
35 | * Dynamic typing / boxed types (specifically, the ability to store |
36 | different data types in the sequential and associative structures | |
37 | and the language keeps track of the type for you) | |
0f4ca9d1 JM |
38 | * Compound data types support arbitrary runtime "hidden" data |
39 | (metadata, metatables, dynamic fields attributes) | |
40 | ||
41 | Here are some examples of languages that have all of the above | |
42 | features: JavaScript, Ruby, Python, Lua, R, Clojure. | |
43 | ||
44 | Many of the most popular languages already have Mal implementations. | |
45 | However, this should not discourage you from creating your own | |
46 | implementation in a language that already has one. However, if you go | |
47 | this route, I suggest you avoid referring to the existing | |
48 | implementations (i.e. "cheating") to maximize your learning experience | |
49 | instead of just borrowing mine. On the other hand, if your goal is to | |
50 | add new implementations to mal as efficiently as possible, then you | |
51 | SHOULD find the most similar target language implementation and refer | |
52 | to it frequently. | |
53 | ||
bf518367 JM |
54 | If you want a fairly long list of programming languages with an |
55 | approximate measure of popularity, try the [Programming Language | |
56 | Popularity Chart](http://langpop.corger.nl/) | |
57 | ||
0f4ca9d1 JM |
58 | |
59 | ## Getting started | |
60 | ||
61 | * Install your chosen language interpreter/compiler, language package | |
62 | manager and build tools (if applicable) | |
63 | ||
64 | * Fork the mal repository on github and then clone your forked | |
65 | repository: | |
66 | ``` | |
67 | git clone git@github.com:YOUR_NAME/mal.git | |
68 | cd mal | |
69 | ``` | |
70 | ||
71 | * Make a new directory for your implementation. For example, if you | |
72 | * language is called "quux": | |
73 | ``` | |
74 | mkdir quux | |
75 | ``` | |
76 | ||
77 | * Modify the top level Makefile to allow the tests to be run against | |
78 | your implementation. For example, if your language is named "quux" | |
79 | and uses "qx" as the file extension, then make the following | |
80 | 3 modifications to Makefile: | |
81 | ``` | |
82 | IMPLS = ... quux ... | |
83 | ... | |
84 | quux_STEP_TO_PROG = mylang/$($(1)).qx | |
85 | ... | |
86 | quux_RUNSTEP = ../$(2) $(3) | |
87 | ``` | |
88 | ||
89 | This allows you to run tests against your implementation like this: | |
90 | ``` | |
91 | make test^quux^stepX | |
92 | ``` | |
93 | ||
94 | ||
95 | ## General hints | |
96 | ||
97 | Stackoverflow and Google are your best friends. Modern polyglot | |
98 | developers do not memorize dozens of programming languages. Instead, | |
99 | they learn the peculiar terminology used with each language and then | |
100 | use this to search for their answers. | |
101 | ||
102 | Here are some other resources where multiple languages are | |
103 | compared/described: | |
104 | * http://learnxinyminutes.com/ | |
105 | * http://hyperpolyglot.org/ | |
106 | * http://rosettacode.org/ | |
107 | ||
108 | Do not let yourself be bogged down by specific problems. While the | |
109 | make-a-lisp process is structured as a series of steps, the reality is | |
110 | that building a lisp interpreter is more like a branching tree. If you | |
111 | get stuck on tail call optimization, or hash-maps, move on to other | |
112 | things. You will often have a stroke of inspiration for a problem as | |
113 | you work through other functionality. I have tried to structure this | |
114 | guide and the tests to make clear which things are optional or can be | |
115 | deferred until later. | |
116 | ||
117 | An aside on optional bits: when you run the tests for a given step, | |
118 | the last tests are often marked with an "optional" header. This | |
119 | indicates that these are tests for functionality that is not critical | |
120 | to finish a basic mal implementation. Many of the steps in this | |
121 | process guide also have an "Optional" section, however, it is not | |
122 | quite the same meaning. Those sections do include the functionality | |
123 | that is marked as optional in the tests, but they also include | |
124 | functionality that becomes mandatory at a later step. In other words, | |
125 | this is a "make your own Lisp adventure". | |
126 | ||
127 | Use test driven development. Each step of the make-a-lisp process has | |
128 | a bunch of tests associated with it and there is an easy script to run | |
129 | all the tests for a specific step in the process. Pick a failing test, | |
130 | fix it, repeat until all the tests for that step pass. | |
131 | ||
132 | The `process` directory contains abbreviated pseudocode and | |
133 | architecture images for each step of the make-a-lisp process. Use | |
134 | a textual diff/comparison tool to compare the previous pseudocode step | |
135 | with the one you are working on. The architecture images have changes | |
136 | from the previous step highlighted in red. | |
137 | ||
138 | If you get stuck, find the same step or functionality in a different | |
139 | implementation language. | |
140 | ||
141 | ||
142 | ## The Make-A-Lisp Process | |
143 | ||
144 | In the steps that follow the name of the target language is "quux" and | |
145 | the file extension for that language is "qx". | |
146 | ||
147 | ||
148 | <a name="step0"></a> | |
149 | ||
150 | ### Step 0: The REPL | |
151 | ||
152 | ![step0_repl architecture](step0_repl.png) | |
153 | ||
154 | This step is basically just creating a skeleton of your interpreter. | |
155 | ||
156 | * Create a `step0_repl.qx` file in `quux/`. | |
157 | ||
158 | * Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep` | |
159 | (read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just | |
160 | stubs that return their first parameter (a string if your target | |
161 | language is a statically typed) and `rep` calls them in order | |
162 | passing the return to the input of the next. | |
163 | ||
164 | * Add a main loop that repeatedly prints a prompt, gets a line of | |
165 | input from the user, calls `rep` with that line of input, and then | |
166 | prints out the result from `rep`. It should also exit when you send | |
167 | it an EOF (often Ctrl-D). | |
168 | ||
169 | * If you are using a compiled (ahead-of-time rather than just-in-time) | |
170 | language, then create a Makefile (or appropriate project definition | |
171 | file) in your directory. | |
172 | ||
173 | Run your new program and make sure that it echos each line that you | |
174 | type. Because step0 is so trivial, there are no automated tests to run | |
175 | for it. | |
176 | ||
177 | Add and then commit your new `step0_repl.qx` and `Makefile` to git. | |
178 | ||
179 | Congratulations! You have just completed the first step of the | |
180 | make-a-lisp process. | |
181 | ||
182 | ||
183 | #### Optional: | |
184 | ||
185 | * Add full line editing and command history support to your | |
186 | interpreter REPL. Many languages have a library/module that provide | |
187 | line editing support. Another option if your language supports it is | |
188 | to use an FFI (foreign function interface) to load and call directly | |
189 | into GNU readline, editline, or libnoise library. Add line | |
190 | editing interface code to `readline.qx` | |
191 | ||
192 | ||
193 | <a name="step1"></a> | |
194 | ||
195 | ### Step 1: Read and Print | |
196 | ||
197 | ![step1_read_print architecture](step1_read_print.png) | |
198 | ||
199 | In this step, your interpreter will "read" the string from the user | |
200 | and parse it into an internal tree data structure (an abstract syntax | |
201 | tree) and then take that data structure and "print" it back to | |
202 | a string. | |
203 | ||
204 | In non-lisp languages, this step (called "lexing and parsing") can be | |
205 | one of the most complicated parts of the compiler/interpreter. In | |
206 | Lisp, the data structure that you want in memory is basically | |
207 | represented directly in the code that the programmer writes | |
208 | (homoiconicity). | |
209 | ||
210 | For example, if the string is "(+ 2 (* 3 4))" then the read function | |
211 | will process this into a tree structure that looks like this: | |
212 | ``` | |
213 | List | |
214 | / | \ | |
215 | / | \ | |
216 | / | \ | |
217 | Sym:+ Int:2 List | |
218 | / | \ | |
219 | / | \ | |
220 | / | \ | |
221 | Sym:* Int:3 Int:4 | |
222 | ``` | |
223 | ||
224 | Each left paren and its matching right paren (lisp "sexpr") becomes | |
225 | a node in the tree and everything else becomes a leaf in the tree. | |
226 | ||
227 | If you can find code for an implementation of a JSON encoder/decoder | |
228 | in your target language then you can probably just borrow and modify | |
229 | that and be 75% of the way done with this step. | |
230 | ||
231 | The rest of this section is going to assume that you are not starting | |
232 | from an existing JSON encoder/decoder, but that you do have access to | |
233 | a Perl compatible regular expressions (PCRE) module/library. You can | |
234 | certainly implement the reader using simple string operations, but it | |
235 | is more involved. The `make`, `ps` (postscript) and Haskell | |
236 | implementations have examples of a reader/parser without using regular | |
237 | expression support. | |
238 | ||
239 | * Copy `step0_repl.qx` to `step1_read_print.qx`. | |
240 | ||
241 | * Add a `reader.qx` file to hold functions related to the reader. | |
242 | ||
243 | * If the target language has objects types (OOP), then the next step | |
244 | is to create a simple stateful Reader object in `reader.qx`. This | |
245 | object will store the tokens and a position. The Reader object will | |
246 | have two methods: `next` and `peek`. `next` returns the tokens at | |
247 | the current position and increments the position. `peek` just | |
248 | returns the token at the current position. | |
249 | ||
250 | * Add a function `read_str` in `reader.qx`. This function | |
251 | will call `tokenizer` and then create a new Reader object instance | |
252 | with the tokens. Then it will call `read_form` with the Reader | |
253 | instance. | |
254 | ||
255 | * Add a function `tokenizer` in `reader.qx`. This function will take | |
256 | a single single string and return an array/list | |
257 | of all the tokens (strings) in it. The following regular expression | |
258 | (PCRE) will match all mal tokens. | |
259 | ``` | |
260 | [\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"|;.*|[^\s\[\]{}('"`,;)]*) | |
261 | ``` | |
262 | ||
263 | * Add the function `read_form` to `reader.qx`. This function | |
264 | will peek at the first token in the Reader object and switch on the | |
265 | first character of that token. If the character is a left paren then | |
266 | `read_list` is called with the Reader object. Otherwise, `read_atom` | |
267 | is called with the Reader Object. The return value from `read_form` | |
268 | is a mal data type. If your target language is statically typed then | |
269 | you will need some way for `read_form` to return a variant or | |
270 | subclass type. For example, if your language is object oriented, | |
271 | then you cal define a top level MalType (in `types.qx`) that all | |
272 | your mal data types inherit from. The MalList type (which also | |
273 | inherits from MalType) will contains a list/array of other MalTypes. | |
274 | If your language is dynamically typed then you can likely just | |
275 | return a plain list/array of other mal types. | |
276 | ||
277 | * Add the function `read_list` to `reader.qx`. This function will | |
278 | repeatedly call `read_form` with the Reader object until it | |
279 | encounters a ')' token (if it reach EOF before reading a ')' then | |
280 | that is an error). It accumulates the results into a List type. If | |
281 | your language does not have a sequential data type that can hold mal | |
282 | type values you may need to implement one (in `types.qx`). Note | |
283 | that `read_list` repeatedly calls `read_form` rather than | |
284 | `read_atom`. This mutually recursive defintion between `read_list` | |
285 | and `read_form` is what allows lists to contain lists. | |
286 | ||
287 | * Add the function `read_atom` to `reader.qx`. This function will | |
288 | look at the contents of the token and return the appropriate scalar | |
289 | (simple/single) data type value. Initially, you can just implement | |
290 | numbers (integers) and symbols . This will allow you to proceed | |
291 | through the next couple of steps before you will need to implement | |
292 | the other fundamental mal types: nil, true, false, and string. The | |
293 | remaining mal types: keyword, vector, hash-map, and atom do not | |
294 | need to be implemented until step 9 (but can be implemented at any | |
295 | point between this step and that). BTW, symbols types are just an | |
296 | object that contains a single string name value (some languages have | |
297 | symbol types already). | |
298 | ||
299 | * Add a file `printer.qx`. This file will contain a single function | |
300 | `pr_str` which does the opposite of `read_str`: take a mal data | |
301 | structure and return a string representation of it. But `pr_str` is | |
302 | much simpler and is basically just a switch statement on the type of | |
303 | the input object: | |
304 | ||
305 | * symbol: return the string name of the symbol | |
306 | * number: return the number as a string | |
307 | * list: iterate through each element of the list calling `pr_str` on | |
308 | it, then join the results with a space separator, and surround the | |
309 | final result with parens | |
310 | ||
311 | * Change the `READ` function in `step1_read_print.qx` to call | |
312 | `reader.read_str` and the `PRINT` function to call `printer.pr_str`. | |
313 | `EVAL` continues to simply return its input but the type is now | |
314 | a mal data type. | |
315 | ||
316 | You now have enough hooked up to begin testing your code. You can | |
317 | manually try some simple inputs: | |
318 | * `123` -> `123` | |
319 | * ` 123 ` -> `123` | |
320 | * `abc` -> `abc` | |
321 | * ` abc ` -> `abc` | |
322 | * `(123 456)` -> `(123 456)` | |
323 | * `( 123 456 789 ) ` -> `(123 456 789)` | |
324 | * `( + 2 (+ 3 4) ) ` -> `(+ 2 (* 3 4))` | |
325 | ||
326 | To verify that your code is doing more than just eliminating extra | |
327 | spaces (and not failing), you can instrument your `reader.qx` functions. | |
328 | ||
329 | Once you have gotten past those simple manual tests, it is time to run | |
330 | the full suite of step 1 tests. Go to the top level and run the | |
331 | following: | |
332 | ``` | |
333 | make test^quux^step1 | |
334 | ``` | |
335 | ||
336 | Fix any test failures related to symbols, numbers and lists. | |
337 | ||
338 | Depending on the functionality of your target language, it is likely | |
339 | that you have now just completed one of the most difficult steps. It | |
340 | is down hill from here. The remaining steps will probably be easier | |
341 | and each step will give progressively more bang for the buck. | |
342 | ||
343 | #### Optional: | |
344 | ||
345 | ||
346 | * Add error checking to your reader functions to make sure parens | |
347 | are properly matched. Catch and print these errors in your main | |
348 | loop. If your language does not have try/catch style bubble up | |
349 | exception handling, then you will need to add explicit error | |
350 | handling to your code to catch and pass on errors without crashing. | |
351 | ||
352 | * Add support for the other basic data type to your reader and printer | |
353 | functions: string, nil, true, and false. These become mandatory at | |
354 | step 4. When a string is read, a slash followed by a doublequote is | |
355 | translatd into a plain doublequote character and a slash followed by | |
356 | "n" is translated into a newline. To properly print a string (for | |
357 | step 4 string functions), the `pr_str` function needs another | |
358 | parameter called `print_readably`. When `print_readably` is true, | |
359 | doublequotes and newlines are translated into their printed | |
360 | representations (the reverse of the reader). The `PRINT` function in | |
361 | the main program should call `pr_str` with print_readably set to | |
362 | true. | |
363 | ||
364 | * Add support for the other mal types: keyword, vector, hash-map, and | |
365 | atom. TODO/TBD | |
366 | * keyword: just a string stored with unicode prefix (or char 127 if | |
367 | no unicode support). | |
368 | * vector: can be implemented with same underlying type as list if | |
369 | there is some mechanism for marking/distringuishing from a list. | |
370 | * hash-map: only need to implement string keys (which enables | |
371 | keyword keys since they are just special strings). | |
372 | ||
373 | * Add support for reader macros which are special forms that are | |
374 | transformed into other forms during the read phase. | |
375 | ||
376 | * Add comment support to your reader. The tokenizer should ignore | |
377 | tokens that start with ";". Your `read_str` function will need to | |
378 | properly handle when the tokenizer returns no values. The simplest | |
379 | way to do this is to return `nil` mal value. A cleaner option (that | |
380 | does not print `nil` at the prompt is to throw a special exception | |
381 | that causes the main loop to simply continue at the beginning of the | |
382 | loop without calling `rep`. | |
383 | ||
384 | ||
385 | <a name="step2"></a> | |
386 | ||
387 | ### Step 2: Eval | |
388 | ||
389 | ![step2_eval architecture](step2_eval.png) | |
390 | ||
391 | In step 1 your mal interpreter was basically just a way to validate | |
392 | input and eliminate extraneous white space. In this step you will turn | |
393 | your interpreter into a simple number calculator by adding | |
394 | functionality to the evaluator (`EVAL`). | |
395 | ||
396 | Compare the pseudocode for step 1 and step 2 to get a basic idea of | |
397 | the changes that will be made during this step: | |
398 | ``` | |
399 | diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt | |
400 | ``` | |
401 | ||
402 | * Copy `step1_read_print.qx` to `step2_eval.qx`. | |
403 | ||
404 | * Define a simple initial REPL environment. This environment is an | |
405 | associative structure that maps symbols (or symbol names) to | |
406 | numeric functions. For example, in python this would look something | |
407 | like this: | |
408 | ``` | |
409 | repl_env = {'+': lambda a,b: a+b, | |
410 | '-': lambda a,b: a-b, | |
411 | '*': lambda a,b: a*b, | |
412 | '/': lambda a,b: int(a/b)} | |
413 | ``` | |
414 | ||
415 | * Modify the `rep` function to pass the REPL environment as the second | |
416 | parameter for the `EVAL` call. | |
417 | ||
418 | * Create a new function `eval_ast` which takes `ast` (mal data type) | |
419 | and an associative structure (the environment from above). | |
420 | `eval_ast` switches on the type of `ast` as follows: | |
421 | ||
422 | * symbol: lookup the symbol in the environment structure and return | |
423 | the value or raise an error no value is found | |
424 | * list: return a new list that is the result of calling `EVAL` on | |
425 | each of the members of the list | |
426 | * otherwise just return the original `ast` value | |
427 | ||
428 | * Modify `EVAL` to check if the first parameter `ast` is a list. | |
429 | * `ast` is not a list: then return the result of calling `eval_ast` | |
430 | on it. | |
431 | * `ast` is a list: call `eval_ast` to get a new evaluated list. Take | |
432 | the first item of the evaluated list and call it as function using | |
433 | the rest of the evaluated list as its arguments. | |
434 | ||
435 | If your target language does not have full variable length argument | |
436 | support (e.g. variadic, vararg, splats, apply) then you will need to | |
437 | pass the full list of arguments as a single parameter and split apart | |
438 | the individual values inside of every mal function. This is annoying, | |
439 | but workable. | |
440 | ||
441 | The process of taking a list and invoking or executing it to return | |
442 | something new is known in Lisp as the "apply" phase. | |
443 | ||
444 | Try some simple expressions: | |
445 | ||
446 | * `(+ 2 3)` -> `5` | |
447 | * `(+ 2 (* 3 4))` -> `14` | |
448 | ||
449 | The most likely challenge you will encounter is how to properly call | |
450 | a function references using an arguments list. | |
451 | ||
452 | Now go to the top level, run the step 2 tests and fix the errors. | |
453 | ``` | |
454 | make test^quux^step2 | |
455 | ``` | |
456 | ||
457 | You now have a simple prefix notation calculator! | |
458 | ||
459 | ||
460 | <a name="step3"></a> | |
461 | ||
462 | ### Step 3: Environments | |
463 | ||
464 | ![step3_env architecture](step3_env.png) | |
465 | ||
466 | In step 2 you were already introduced to REPL environment (`repl_env`) | |
467 | where the basic numeric functions were stored and looked up. In this | |
468 | step you will add the ability to create new environments (`let*`) and | |
469 | modify exiting environments (`def!`). | |
470 | ||
471 | A Lisp environment is an associative data structure that maps symbols (the | |
472 | keys) to values. But Lisp environments have an additional important | |
473 | function: they can refer to another environment (the outer | |
474 | environment). During environment lookups, if the current environment | |
475 | does not have the symbol, the lookup continues in the outer | |
476 | environment, and continues this way until the symbol is either found, | |
477 | or the outer environment is `nil` (the outermost environment in the | |
478 | chain). | |
479 | ||
480 | Compare the pseudocode for step 2 and step 3 to get a basic idea of | |
481 | the changes that will be made during this step: | |
482 | ``` | |
483 | diff -urp ../process/step2_eval.txt ../process/step3_env.txt | |
484 | ``` | |
485 | ||
486 | * Copy `step2_eval.qx` to `step2_env.qx`. | |
487 | ||
488 | * Create `env.qx` to hold the environment definition. | |
489 | ||
490 | * Define an `Env` object that is instantiated with a single `outer` | |
491 | parameter and starts with an empty associative data structure | |
492 | property `data`. | |
493 | ||
494 | * Define three methods for the Env object: | |
495 | * set: takes a symbol key and a mal value and adds to the `data` | |
496 | structure | |
497 | * find: takes a symbol key and if the current environment contains | |
498 | that key then return the environment. If no key is found and outer | |
499 | is not `nil` then call find (recurse) on the outer environment. | |
500 | * get: takes a symbol key and uses the `find` method to locate the | |
501 | environment with the key, then returns the matching value. If no | |
502 | key is found up the outer chain, then throws/raises a "not found" | |
503 | error. | |
504 | ||
505 | * Update `step2_env.qx` to use the new `Env` type to create the | |
506 | repl_env (with a `nil` outer value) and use the `set` method to add | |
507 | the numeric functions. | |
508 | ||
509 | * Modify `eval_ast` to call the `get` method on the `env` parameter. | |
510 | ||
511 | * Modify the apply section of `EVAL` to switch on the first element of | |
512 | the list: | |
513 | * symbol "def!": call the set method of the current environment | |
514 | (second parameter of `EVAL` called `env`) using the unevaluated | |
515 | first parameter (second list element) as the symbol key and the | |
516 | evaluated second parameter as the value. | |
517 | * symbol "let*": create a new environment using the current | |
518 | environment as the outer value and then use the first parameter as | |
519 | a list of new bindings in the "let" environment. Take the second | |
520 | element of the binding list, call `EVAL` using the new "let*" | |
521 | environment as the evaluation environment, then call `set` on the | |
522 | "let" environment using the first binding list element as the key | |
523 | and the evaluated second element as the value. This is repeated | |
524 | for each odd/even pair in the binding list. Note in particular, | |
525 | the bindings earlier in the list can be referred to by later | |
526 | bindings. Finally, the second parameter (third element) of the | |
527 | original `let*` form is evaluated using the new "let*" environment | |
528 | and the result is returned as the result of the `let*` (the new | |
529 | let environment is discarded upon completion). | |
530 | * otherwise: call `eval_ast` on the list and apply the first element | |
531 | to the rest as before. | |
532 | ||
533 | `def!` and `let*` are Lisp "specials" (or "special atoms") which means | |
534 | that they are language level features and more specifically that the | |
535 | rest of the list elements (arguments) may be evaluated differently (or | |
536 | not at all) unlike the default apply case where all elements of the | |
537 | list are evaluated before the first element is invoked. Lists which | |
538 | contain a "special" as the first element are known as "special forms". | |
539 | The are special because the follow special evaluation rules. | |
540 | ||
541 | Try some simple environment tests: | |
542 | ||
543 | * `(def! a 6)` -> `6` | |
544 | * `a` -> `6` | |
545 | * `(def! b (+ a 2))` -> `8` | |
546 | * `(+ a b)` -> `14` | |
547 | * `(let* (c 2) c)` -> `2` | |
548 | ||
549 | Now go to the top level, run the step 3 tests and fix the errors. | |
550 | ``` | |
551 | make test^quux^step3 | |
552 | ``` | |
553 | ||
554 | You mal implementation is still basically just a numeric calculator | |
555 | with save/restore capability. But you have set the foundation for step | |
556 | 4 where it will begin to feel like a real programming language. | |
557 | ||
558 | ||
559 | An aside on mutation and typing: | |
560 | ||
561 | The "!" suffix on symbols is used to indicate that this symbol refers | |
562 | to a function that mutates something else. In this case, the `def!` | |
563 | symbol indicates a special form that will mutate the current | |
564 | environment. Many (maybe even most) of runtime problems that are | |
565 | encountered in software engineering are a result of mutation. By | |
566 | clearly marking code where mutation may occur, you can more easily | |
567 | track down the likely cause of runtime problems when they do occur. | |
568 | ||
569 | Another cause of runtime errors is type errors, where a value of one | |
570 | type is unexpectedly treated by the program as a different and | |
571 | incompatible type. Statically typed languages try to make the | |
572 | programmer solve all type problems before the program is allowed to | |
573 | run. Most Lisp variants tend to be dynamically typed (types of values | |
574 | are checked when they are actually used at runtime). | |
575 | ||
576 | As an aside-aside: The great debate between static and dynamic typing | |
577 | debate can be understood by following the money. Advocates of strict | |
578 | static typing use words like "correctness" and "safety" and thus get | |
579 | government and academic funding. Advocates of dynamic typing use words | |
580 | like "agile" and "time-to-market" and thus get venture capital and | |
581 | commercial funding. | |
582 | ||
583 | ||
584 | <a name="step4"></a> | |
585 | ||
586 | ### Step 4: If Fn Do | |
587 | ||
588 | ![step4_if_fn_do architecture](step4_if_fn_do.png) | |
589 | ||
590 | In step 3 you added environments and the special forms for | |
591 | manipulating environments. In this step you will add 3 new special | |
592 | forms (`if`, `fn*` and `do`) and add several more core functions to | |
593 | the default REPL environment. Our new architecture will look like | |
594 | this: | |
595 | ||
596 | The `fn*` special form is how new user-defined functions are created. | |
597 | In some Lisps, this special form is named "lambda". | |
598 | ||
599 | Compare the pseudocode for step 3 and step 4 to get a basic idea of | |
600 | the changes that will be made during this step: | |
601 | ``` | |
602 | diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt | |
603 | ``` | |
604 | ||
605 | * Copy `step3_env.qx` to `step4_if_fn_do.qx`. | |
606 | ||
607 | * If you have not implemented reader and printer support (and data | |
608 | types) for `nil`, `true` and `false`, you will need to do so for | |
609 | this step. | |
610 | ||
611 | * Update the constructor/initializer for environments to take two new | |
612 | arguments: `binds` and `exprs`. Bind (`set`) each element (symbol) | |
613 | of the binds list to the respective element of the `exprs` list. | |
614 | ||
615 | * Add support to `printer.qx` to print functions values. A string | |
616 | literal like "#<function>" is sufficient. | |
617 | ||
618 | * Add the following special forms to `EVAL`. | |
619 | ||
620 | * `do`: Evaluate the all the elements of the list and return the | |
621 | final element (evaluated). | |
622 | * `if`: Evaluate the first parameter (second element). If the result | |
623 | (condition) is anything other than `nil` or `false`, then evaluate | |
624 | the second parammeter (third element of the list) and return the | |
625 | result. Otherwise, evaluate the third parameter (fourth element) | |
626 | and return the result. If condition is false and there is no third | |
627 | parameter, then just return `nil`. | |
628 | * `fn*`: Return a new function closure. The body of that closure | |
629 | does the following: | |
630 | * Create a new environment using `env` (closed over from outer | |
631 | scope) as the `outer` parameter, the first parameter (second | |
632 | list element of `ast` from the outer scope) as the `binds` | |
633 | parameter, and the parameters to the closure as the `exprs` | |
634 | parameter. | |
635 | * Call `EVAL` on the second parameter (third list element of `ast` | |
636 | from outer scope), using the new environment. Use the result as | |
637 | the return value of the closure. | |
638 | ||
639 | If your target language does not support closures, then you will need | |
640 | to implement `fn*` using some sort of structure or object that stores | |
641 | the values being closed over: the first and second elements of the | |
642 | `ast` list (function parameter list and function body) and the current | |
643 | environment `env`. In this case, your native functions will need to be | |
644 | wrapped in the same way. You will probably also need a method/function | |
645 | that invokes your function object/structure for the default case of | |
646 | the apply section of `EVAL`. | |
647 | ||
648 | Try out the basic functionality you have implemented: | |
649 | ||
650 | * `(fn* [a] a)` -> `#<function>` | |
651 | * `( (fn* [a] a) 7)` -> `7` | |
652 | * `( (fn* [a] (+ a 1)) 10)` -> `11` | |
653 | * `( (fn* [a b] (+ a b)) 2 3)` -> `5` | |
654 | ||
655 | * Add a new file `core.qx` and define an associative data structure | |
656 | `ns` (namespace) that maps symbols to functions. Move the numeric | |
657 | function definitions into this structure. | |
658 | ||
659 | * Modify `step4_if_fn_do.qx` to iterate through the `core.ns` | |
660 | structure and add (`set`) each symbol/function mapping to the | |
661 | REPL environment (`repl_env`). | |
662 | ||
663 | * Add the following functions to `core.ns`: | |
664 | * `list`: take the parameters and return them as a list. | |
665 | * `list?`: return true if the first parameter is a list, false | |
666 | otherwise. | |
667 | * `empty?`: treat the first parameter as a list and return true if | |
668 | the list is empty and false if it contains any elements. | |
669 | * `count`: treat the first parameter as a list and return the number | |
670 | of elements that it contains. | |
671 | * `=`: compare the first two parameters and return true if they are | |
672 | the same type and contain the same value. In the case of equal | |
673 | length lists, each element of the list should be compared for | |
674 | equality and if they are the same return true, otherwise false. | |
675 | * `<`, `<=`, `>`, and `>=`: treat the first two parameters as | |
676 | numbers and do the corresponding numeric comparison, returning | |
677 | either true or false. | |
678 | ||
679 | Now go to the top level, run the step 4 tests. There are a lot of | |
680 | tests in step 4 but all of the non-optional tests that do not involve | |
681 | strings should be able to pass now. | |
682 | ||
683 | ``` | |
684 | make test^quux^step4 | |
685 | ``` | |
686 | ||
687 | Your mal implementation is already beginning to look like a real | |
688 | language. You have flow control, conditionals, user-defined functions | |
689 | with lexical scope, side-effects (if you implement the string | |
690 | functions), etc. However, our little interpreter has not quite reach | |
691 | Lisp-ness yet. The next several steps will take | |
692 | ||
693 | #### Optional: | |
694 | ||
695 | * Implement Clojure-style variadic function parameters. Modify the | |
696 | constructor/initializer for environments, so that if a "&" symbol is | |
697 | encountered in the `binds` list, the next symbol in the `binds` list | |
698 | after the "&" is bound to the rest of the `exprs` list that has not | |
699 | been bound yet. | |
700 | ||
701 | * Defines a `not` function using mal itself. In `step4_if_fn_do.qx` | |
702 | call the `rep` function with this string: | |
703 | "(def! not (fn* (a) (if a false true)))". | |
704 | ||
705 | * Implement the strings functions in `core.qx`. To implement these | |
706 | functions, you will need to implement the string support in the | |
707 | reader and printer (optional section of step 1). Each of the string | |
708 | functions takes multiple mal values, prints them (`pr_str`) and | |
709 | joins them together into a new string. | |
710 | * `pr-str`: calls `pr_str` on each argument with `print_readably` | |
711 | set to true, joins the results with " " and returns the new | |
712 | string. | |
713 | * `str`: calls `pr_str` on each argument with `print_readably` set | |
714 | to false, concatenates the results together ("" separator), and | |
715 | returns the new string. | |
716 | * `prn`: calls `pr_str` on each argument with `print_readably` set | |
717 | to true, joins the results with " ", prints the string to the | |
718 | screen and then returns `nil`. | |
719 | * `println`: calls `pr_str` on each argument with `print_readably` set | |
720 | to false, joins the results with " ", prints the string to the | |
721 | screen and then returns `nil`. | |
722 | ||
723 | ||
724 | <a name="step5"></a> | |
725 | ||
726 | ### Step 5: Tail call optimization | |
727 | ||
728 | ![step5_tco architecture](step5_tco.png) | |
729 | ||
730 | In step 4 you added special forms `do`, `if` and `fn*` and you defined | |
731 | some core functions. In this step you will add a Lisp feature called | |
732 | tail call optimization (TCO). Also called "tail recursion" or | |
733 | sometimes just "tail calls". | |
734 | ||
735 | Several of the special forms that you have defined in `EVAL` end up | |
736 | calling back into `EVAL`. For those forms that call `EVAL` as the last | |
737 | thing that they do before returning (tail call) you will just loop back | |
738 | to the beginning of eval rather than calling it again. The advantage | |
739 | of this approach is that it avoids adding more frames to the call | |
740 | stack. This is especially important in Lisp languages because they do | |
741 | not tend to have iteration control structures preferring recursion | |
742 | instead. However, with tail call optimization, recursion can be made | |
743 | as stack efficient as iteration. | |
744 | ||
745 | Compare the pseudocode for step 4 and step 5 to get a basic idea of | |
746 | the changes that will be made during this step: | |
747 | ``` | |
748 | diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt | |
749 | ``` | |
750 | ||
751 | * Copy `step4_env.qx` to `step5_tco.qx`. | |
752 | ||
753 | * Add a loop (e.g. while true) around all code in `EVAL`. | |
754 | ||
755 | * Modify each of the following form cases to add tail call recursion | |
756 | support: | |
757 | * `let*`: remove the final `EVAL` call on the second `ast` argument | |
758 | (third list element). Set `env` (i.e. the local variable passed in | |
759 | as second parameter of `EVAL`) to the new let environment. Set | |
760 | `ast` (i.e. the local variable passed in as first parameter of | |
761 | `EVAL`) to be the second `ast` argument. Continue at the beginning | |
762 | of the loop (no return). | |
763 | * `do`: change the `eval_ast` call to evaluate all the parameters | |
764 | the except for the last (2nd list element up to but not | |
765 | including last). Set `ast` to the last element of `ast`. Continue | |
766 | at the beginning of the loop (`env` stays unchanged). | |
767 | * `if`: the condition continues to be evaluated, however, rather | |
768 | than evaluating the true or false branch, `ast` is set to the | |
769 | unevaluated value of the chosen branch. Continue at the beginning | |
770 | of the loop (`env` is unchanged). | |
771 | ||
772 | * The return value from the `fn*` special form will now become an | |
773 | object/structure with attributes that allow the default invoke case | |
774 | of `EVAL` to do TCO on mal functions. Those attributes are: | |
775 | * `fn`: the original function value return in step 4 | |
776 | * `ast`: the second `ast` argument (third list element) representing | |
777 | the body of the function. | |
778 | * `params`: the first `ast` argument (second list element) | |
779 | representing the parameter names of the function. | |
780 | * `env`: the current value of the `env` parameter of `EVAL`. | |
781 | ||
782 | * The default "apply"/invoke case of `EVAL` must now be changed to | |
783 | account for the new object/structure returned by the `fn*` form. | |
784 | Continue to call `eval_ast` on `ast`. The first element is `f`. | |
785 | Switch on the type of `f`: | |
786 | * regular function (not one defined by `fn*`): apply/invoke it as | |
787 | * before (in step 4). | |
788 | * a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate | |
789 | a new environment using the `env` and `params` attributes of `f` | |
790 | as the `outer` and `binds` arguments and rest `ast` arguments | |
791 | (list elements 2 through the end) as the `exprs` argument. Set | |
792 | `env` to the new environment. Continue at the beginning of the loop. | |
793 | ||
794 | Run some manual tests from previous steps to make sure you have not | |
795 | broken anything by adding TCO. | |
796 | ||
797 | Now go to the top level, run the step 5 tests. | |
798 | ||
799 | ``` | |
800 | make test^quux^step5 | |
801 | ``` | |
802 | ||
803 | Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to` | |
804 | function cannot be tail call optimized because it does something after | |
805 | the recursive call (`sum-to` calls itself and then does the addition). | |
806 | Lispers say that the `sum-to` is not in tail position. The `sum2` | |
807 | function however, calls itself from tail position. In other words, the | |
808 | recursive call to `sum2` is the last action that `sum2` does. Calling | |
809 | `sum-to` with a large value will cause a stack overflow exception in | |
810 | most target languages (some have super-special tricks they use to | |
811 | avoid stack overflows). | |
812 | ||
813 | Congratulations, your mal implementation already has a feature (TCO) | |
814 | that most mainstream languages lack. | |
815 | ||
816 | ||
817 | <a name="step6"></a> | |
818 | ||
819 | ### Step 6: Files and Evil | |
820 | ||
821 | ![step6_file architecture](step6_file.png) | |
822 | ||
823 | In step 5 you added tail call optimization. In this step you will add | |
824 | some string and file operations and give your implementation a touch | |
825 | of evil ... er, eval. And as long as your language supports function | |
826 | closures, this step will be quite simple. However, to complete this | |
827 | step, you must implement string type support, so if you have been | |
828 | holding off on that you will need to go back and do so. | |
829 | ||
830 | Compare the pseudocode for step 5 and step 6 to get a basic idea of | |
831 | the changes that will be made during this step: | |
832 | ``` | |
833 | diff -urp ../process/step5_tco.txt ../process/step6_file.txt | |
834 | ``` | |
835 | ||
ffd31966 JM |
836 | * Copy `step5_tco.qx` to `step6_file.qx`. |
837 | ||
0f4ca9d1 JM |
838 | * Add two new string functions to the core namespaces: |
839 | * `read-string`: this function just exposes the `read_str` function | |
840 | from the reader. If your mal string type is not the same as your | |
841 | target language (e.g. statically typed language) then your | |
842 | `read-string` function will need to unbox (extract) the raw string | |
843 | from the mal string type in order to call `read_str`. | |
844 | * `slurp`: this function takes a file name (string) and returns the | |
845 | contents of the file as a string. Once again, if your mal string | |
846 | type wraps a raw target language string, then you will need to | |
847 | unmarshall (extract) the string parameter to get the raw file name | |
848 | string and marshall (wrap) the result back to a mal string type. | |
849 | ||
850 | * In your main program, add a new `eval` (symbol) entry to your REPL | |
851 | environment. The value of the new entry is a regular function | |
852 | closure with a single argument `ast`. The closure calls the real | |
853 | `EVAL` function using the `ast` as the first argument and the REPL | |
854 | environment (closed over from outside) as the second argument. The | |
855 | result of the `EVAL` call is returned. | |
856 | ||
857 | * Define a `load-file` function using mal itself. In your main | |
858 | program call the `rep` function with this string: | |
859 | "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \")\")))))". | |
860 | ||
861 | Try out `load-file`: | |
862 | * `(load-file "../tests/incA.mal")` -> `9` | |
863 | * `(inc4 3)` -> `7` | |
864 | ||
865 | The `load-file` function does the following: | |
866 | * Call `slurp` to read in a file by name. Surround the contents with | |
867 | "(do ...)" so that the whole file will be treated as a single | |
868 | program AST (abstract syntax tree). | |
869 | * Call `read-string` on the string returned from `slurp`. This uses | |
870 | the reader to read/convert the file contents into mal data/AST. | |
871 | * Call `eval` (the one in the REPL environment) on the AST returned | |
872 | from `read-string` to "run" it. | |
873 | ||
874 | Now go to the top level, run the step 6 tests. The optional tests will | |
875 | need support from the reader for comments, vectors and hash-maps: | |
876 | ``` | |
877 | make test^quux^step6 | |
878 | ``` | |
879 | ||
880 | Congratulations, you now have a full-fledged scripting language that | |
881 | can run other mal programs. However, the set of functions that are | |
882 | available (from `core.qx`) is fairly limited. The bulk of the | |
883 | functions you will add are described in step 9, but you will begin to | |
884 | flesh them out over the next few steps to support quoting (step 7) and | |
885 | macros (step 8). | |
886 | ||
887 | ||
888 | #### Optional: | |
889 | ||
890 | * Add the ability to run another mal program from the command line. | |
891 | Prior to the REPL loop, check if your mal implementation is called | |
892 | with command line arguments. If so, treat the first argument as | |
893 | a filename and use `rep` to call `load-file` on that filename, and | |
894 | finally exit/terminate execution. | |
895 | ||
896 | * Add the rest of the command line arguments to your REPL environment | |
897 | so that programs that are run with `load-file` have access to their | |
898 | calling environmnet. Add a new "*ARGV*" (symbol) entry to your REPL | |
899 | environment. The value of this entry should be the rest of the | |
900 | command line arguments as a mal list value. | |
901 | ||
902 | ||
903 | <a name="step7"></a> | |
904 | ||
905 | ### Step 7: Quoting | |
906 | ||
907 | ![step7_quote architecture](step7_quote.png) | |
908 | ||
909 | In step 7 you will add the special forms `quote` and `quasiquote` and | |
910 | add supporting core functions `cons` and `concat`. The two quote forms | |
911 | add a powerful abstraction for manipulating mal code itself | |
912 | (meta-programming). | |
913 | ||
0f4ca9d1 JM |
914 | The `quote` special form indicates to the evaluator (`EVAL`) that the |
915 | parameter should not be evaluated (yet). At first glance, this might | |
916 | not seem particular useful but an example of what this enables is the | |
917 | ability for a mal program to refer to a symbol itself rather than the | |
918 | value that it evaluates to. Likewise with lists. For example, consider | |
919 | the following: | |
920 | ||
921 | * `(prn abc)`: this will lookup the symbol `abc` in the current | |
922 | evaluation environment and print it. This will result in error if | |
923 | `abc` is not defined. | |
924 | * `(prn (quote abc))`: this will print "abc" (prints the symbol | |
925 | itself). This will work regardless of whether `abc` is defined in | |
926 | the current environment. | |
927 | * `(prn (1 2 3))`: this will result in an error because `1` is not | |
928 | a function and cannot be applied to the arguments `(2 3)`. | |
929 | * `(prn (quote (1 2 3)))`: this will print "(1 2 3)". | |
930 | * `(def! l (quote (1 2 3)))`: list quoting allows us to define lists | |
931 | directly in the code (list literal). Another way of doing this is | |
932 | with the list function: `(def! l (list 1 2 3))`. | |
933 | ||
934 | The second special quoting form is `quasiquote`. This allows a quoted | |
935 | list to have internal elements of the list that are temporarily | |
936 | unquoted (normal evaluation). There are two special forms that only | |
937 | mean something within a quasiquoted list: `unquote` and | |
938 | `splice-unquote`. These are perhaps best explained with some examples: | |
939 | ||
940 | * `(def! lst (quote (2 3)))` -> `(2 3)` | |
941 | * `(quasiquote (1 (unquote lst)))` -> `(1 (2 3))` | |
942 | * `(quasiquote (1 (splice-unquote lst)))` -> `(1 2 3)` | |
943 | ||
944 | The `unquote` form turns evaluation back on for its argument and the | |
945 | result of evaluation is put in place into the quasiquoted list. The | |
946 | `splice-unquote` also turns evaluation back on for its argument, but | |
947 | the evaluated value must be a list which is then "spliced" into the | |
948 | quasiquoted list. The true power of the quasiquote form will be | |
949 | manifest when it used together with macros (in the next step). | |
950 | ||
ffd31966 JM |
951 | Compare the pseudocode for step 6 and step 7 to get a basic idea of |
952 | the changes that will be made during this step: | |
953 | ``` | |
954 | diff -urp ../process/step6_file.txt ../process/step7_quote.txt | |
955 | ``` | |
956 | ||
957 | * Copy `step6_file.qx` to `step7_quote.qx`. | |
958 | ||
0f4ca9d1 JM |
959 | * Before implementing the quoting forms, you will need to implement |
960 | * some supporting functions in the core namespace: | |
961 | * `cons`: this function takes a list as its second | |
962 | parameter and returns a new list that has the first argument | |
963 | prepended to it. | |
964 | * `concat`: this functions takes 0 or more lists as | |
965 | parameters and returns a new list that is a concatenation of all | |
966 | the list parameters. | |
967 | ||
968 | An aside on immutability: note that neither cons or concat mutate | |
969 | their original list arguments. Any references to them (i.e. other | |
970 | lists that they may be "contained" in) will still refer to the | |
971 | original unchanged value. Mal, like Clojure, is a language which uses | |
972 | immutable data structures. I encourage you to read about the power and | |
973 | importance of immutability as implemented in Clojure (from which | |
974 | Mal borrows most of its syntax and feature-set). | |
975 | ||
976 | * Add the `quote` special form. This form just returns its argument | |
977 | (the second list element of `ast`). | |
978 | ||
979 | * Add the `quasiquote` special form. First implement a helper function | |
980 | `is_pair` that returns true if the parameter is a non-empty list. | |
981 | Then define a `quasiquote` function. This is called from `EVAL` with | |
982 | the first `ast` argument (second list element) and then `ast` is set | |
983 | to the result and execution continues at the top of the loop (TCO). | |
984 | The `quasiquote` function takes a parameter `ast` and has the | |
985 | following conditional: | |
986 | 1. if `is_pair` of `ast` is false: return a new list containing: | |
987 | a symbol named "quote" and `ast`. | |
988 | 2. else if the first element of `ast` is a symbol named "unquote": | |
989 | return the second element of `ast`. | |
990 | 3. if `is_pair` of first element of `ast` is true and the first | |
991 | element of first element of `ast` (`ast[0][0]`) is a symbol named | |
992 | "splice-unquote": return a new list containing: a symbol named | |
993 | "concat", the second element of first element of `ast` | |
994 | (`ast[0][1]`), and the result of calling `quasiquote` with the | |
995 | second through last element of `ast`. | |
996 | 4. otherwise: return a new list containing: a symbol named "cons", the | |
997 | result of calling `quasiquote` on first element of `ast` | |
998 | (`ast[0]`), and result of calling `quasiquote` with the second | |
999 | through last element of `ast`. | |
1000 | ||
1001 | ||
1002 | Now go to the top level, run the step 7 tests: | |
1003 | ``` | |
1004 | make test^quux^step7 | |
1005 | ``` | |
1006 | ||
1007 | Quoting is one of the more mundane functions available in mal, but do | |
1008 | not let that discourage you. Your mal implementation is almost | |
1009 | complete, and quoting sets the stage for the next very exiting step: | |
1010 | macros. | |
1011 | ||
1012 | ||
1013 | #### Optional | |
1014 | ||
1015 | * The full names for the quoting forms are fairly verbose. Most Lisp | |
1016 | languages have a short-hand syntax and Mal is no exception. These | |
1017 | short-hand syntaxes are known as reader macros because they allow us | |
1018 | to manipulate mal code during the reader phase. Macros that run | |
1019 | during the eval phase are just called "macros" and are described in | |
1020 | the next section. Expand the conditional with reader `read_form` | |
1021 | function to add the following four cases: | |
1022 | * token is "'" (single quote): return a new list that contains the | |
1023 | symbol "quote" and the result of reading the next form | |
1024 | (`read_form`). | |
1025 | * token is "`" (back-tick): return a new list that contains the | |
1026 | symbol "quasiquote" and the result of reading the next form | |
1027 | (`read_form`). | |
1028 | * token is "~" (tilde): return a new list that contains the | |
1029 | symbol "unquote" and the result of reading the next form | |
1030 | (`read_form`). | |
1031 | * token is "~@" (tilde + at sign): return a new list that contains | |
1032 | the symbol "splice-unquote" and the result of reading the next | |
1033 | form (`read_form`). | |
1034 | ||
1035 | * Add support for quoting of vectors. The `is_pair` function should | |
1036 | return true if the argument is a non-empty list or vector. `cons` | |
1037 | should also accept a vector as the second argument. The return value | |
1038 | is a list regardless. `concat` should support concatenation of | |
1039 | lists, vectors, or a mix or both. The result is always a list. | |
1040 | ||
1041 | ||
1042 | <a name="step8"></a> | |
1043 | ||
1044 | ### Step 8: Macros | |
1045 | ||
1046 | ![step8_macros architecture](step8_macros.png) | |
1047 | ||
1048 | Your mal implementation is now ready for one of the most Lispy and | |
1049 | exciting of all programming concepts: macros. In the previous step, | |
1050 | quoting enabled some simple manipulation data structures and therefore | |
1051 | manipulation of mal code (because the `eval` function from step | |
1052 | 6 turns mal data into code). In this step you will be able to mark mal | |
1053 | functions as macros which can manipulate mal code before it is | |
1054 | evaluated. In other words, macros are user-defined special forms. Or | |
1055 | to look at it another way, macros allow mal programs to redefine | |
1056 | the mal language itself. | |
1057 | ||
1058 | Compare the pseudocode for step 7 and step 8 to get a basic idea of | |
1059 | the changes that will be made during this step: | |
1060 | ``` | |
1061 | diff -urp ../process/step7_quote.txt ../process/step8_macros.txt | |
1062 | ``` | |
1063 | ||
ffd31966 JM |
1064 | * Copy `step7_quote.qx` to `step8_macros.qx`. |
1065 | ||
1066 | ||
0f4ca9d1 JM |
1067 | You might think that the infinite power of macros would require some |
1068 | sort of complex mechanism, but the implementation is actually fairly | |
1069 | simple. | |
1070 | ||
1071 | * Add a new attribute `is_macro` to mal function types. This should | |
1072 | default to false. | |
1073 | ||
1074 | * Add a new special form `defmacro!`. This is very similar to the | |
1075 | `def!` form, but before the evaluated value (mal function) is set in | |
1076 | the environment, the `is_macro` attribute should be set to true. | |
1077 | ||
1078 | * Add a `is_macro_call` function: This function takes arguments `ast` | |
1079 | and `env`. It returns true if `ast` is a list that contains a symbol | |
1080 | as the first element and that symbol refers to a function in the | |
1081 | `env` environment and that function has the `is_macro` attribute set | |
1082 | to true. Otherwise, it returns false. | |
1083 | ||
1084 | * Add a `macroexpand` function: This function takes arguments `ast` | |
1085 | and `env`. It calls `is_macro_call` with `ast` and `env` and loops | |
1086 | while that condition is true. Inside the loop, the first element of | |
1087 | the `ast` list (a symbol), is looked up in the environment to get | |
1088 | the macro function. This macro function is then called/applied with | |
1089 | the rest of the `ast` elements (2nd through the last) as arguments. | |
1090 | The return value of the macro call becomes the new value of `ast`. | |
1091 | When the loop completes because `ast` no longer represents a macro | |
1092 | call, the current value of `ast` is returned. | |
1093 | ||
1094 | * In the evaluator (`EVAL`) before the special forms switch (apply | |
1095 | section), perform macro expansion by calling the `macroexpand` | |
1096 | function with the current value of `ast` and `env`. Set `ast` to the | |
1097 | result of that call. If the new value of `ast` is no longer a list | |
1098 | after macro expansion, then return `ast`, otherwise continue with | |
1099 | the rest of the apply section (special forms switch). | |
1100 | ||
1101 | * Add a new special form condition for `macroexpand`. Call the | |
1102 | `macroexpand` function using the first `ast` argument (second list | |
1103 | element) and `env`. Return the result. This special form allows | |
1104 | a mal program to do explicit macro expansion without applying the | |
1105 | result (which can be useful for debugging macro expansion). | |
1106 | ||
1107 | Now go to the top level, run the step 8 tests: | |
1108 | ``` | |
1109 | make test^quux^step8 | |
1110 | ``` | |
1111 | ||
1112 | ||
1113 | ### Optional | |
1114 | ||
1115 | * Add the following new core functions which are frequently used in | |
1116 | macro functions: | |
1117 | * `nth`: this function takes a list (or vector) and a number (index) | |
1118 | as arguments, returns the element of the list at the given index. | |
1119 | If the index is out of range, this function raises an exception. | |
1120 | * `first`: this function takes a list (or vector) as its argument | |
1121 | and return the first element. If the list (or vector) is empty or | |
1122 | is `nil` then `nil` is returned. | |
1123 | * `rest`: this function takes a list (or vector) as its argument and | |
1124 | returns a new list containing all the elements except the first. | |
1125 | ||
1126 | * In the main program, use the `rep` function to define two new | |
1127 | control structures macros. Here are the string arguments for `rep` | |
1128 | to define these macros: | |
1129 | * `cond`: "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))" | |
1130 | * `or`: "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) `(let* (or_FIXME ~(first xs)) (if or_FIXME or_FIXME (or ~@(rest xs))))))))" | |
1131 | ||
1132 | ||
ffd31966 JM |
1133 | <a name="step9"></a> |
1134 | ||
1135 | ### Step 9: Try | |
1136 | ||
1137 | ![step9_try architecture](step9_try.png) | |
1138 | ||
1139 | Compare the pseudocode for step 8 and step 9 to get a basic idea of | |
1140 | the changes that will be made during this step: | |
1141 | ``` | |
1142 | diff -urp ../process/step8_macros.txt ../process/step9_try.txt | |
1143 | ``` | |
1144 | ||
1145 | * Copy `step8_macros.qx` to `step9_try.qx`. | |
1146 | ||
1147 | * TODO/TBD | |
1148 | ||
1149 | ||
1150 | <a name="step9"></a> | |
1151 | ||
1152 | ### Step A: Interop and Self-hosting | |
1153 | ||
1154 | ![stepA_interop architecture](stepA_interop.png) | |
1155 | ||
1156 | Compare the pseudocode for step 9 and step A to get a basic idea of | |
1157 | the changes that will be made during this step: | |
1158 | ``` | |
1159 | diff -urp ../process/step9_try.txt ../process/stepA_interop.txt | |
1160 | ``` | |
1161 | ||
1162 | * Copy `step9_try.qx` to `stepA_interop.qx`. | |
1163 | ||
1164 | * TODO/TBD | |
1165 | ||
1166 | ||
0f4ca9d1 JM |
1167 | ## TODO: |
1168 | ||
1169 | * simplify: "X argument (list element Y)" -> ast[Y] | |
1170 | * step 8 summary (power of macros, warning about macros, almost to | |
1171 | self-hosting) | |
1172 | * step 9 | |
1173 | * step A | |
1174 | * more info on hash-map and keyword implementation. Hash-maps just | |
1175 | need to support string keys. | |
1176 | * list of types with metadata: list, vector, hash-map, mal functions | |
1177 | * more clarity about when to peek and poke in read_list and read_form | |
1178 | * tokenizer: use first group rather than whole match (to eliminate | |
1179 | whitespace/commas) |