Cleanup typos in section "Step 4: If Fn Do"
[jackhill/mal.git] / process / guide.md
CommitLineData
0f4ca9d1
JM
1# The Make-A-Lisp Process
2
3So you want to write a Lisp interpreter? Welcome!
4
5The goal of the Make-A-Lisp project is to make it easy to write your
6own Lisp interpreter without sacrificing those many "Aha!" moments
7that come from ascending the McCarthy mountain. When you reach the peak
8of this particular mountain, you will have an interpreter for the mal
9Lisp language that is powerful enough to be self-hosting, meaning it
10will be able to run a mal interpreter written in mal itself.
11
12So jump right in (er ... start the climb)!
13
14## Pick a language
15
16You might already have a language in mind that you want to use.
17Technically speaking, mal can be implemented in any sufficiently
c10dcb94 18complete programming language (i.e. Turing complete), however, there are a few
0f4ca9d1
JM
19language features that can make the task MUCH easier. Here are some of
20them in rough order of importance:
21
22* A sequential compound data structure (e.g. arrays, lists,
23 vectors, etc)
24* An associative compound data structure (e.g. a dictionary,
25 hash-map, associative array, etc)
26* Function references (first class functions, function pointers,
27 etc)
28* Real exception handling (try/catch, raise, throw, etc)
29* Variable argument functions (variadic, var args, splats, apply, etc)
30* Function closures
31* PCRE regular expressions
32
33In addition, the following will make your task especially easy:
34
bf518367
JM
35* Dynamic typing / boxed types (specifically, the ability to store
36 different data types in the sequential and associative structures
37 and the language keeps track of the type for you)
0f4ca9d1
JM
38* Compound data types support arbitrary runtime "hidden" data
39 (metadata, metatables, dynamic fields attributes)
40
41Here are some examples of languages that have all of the above
42features: JavaScript, Ruby, Python, Lua, R, Clojure.
43
44Many of the most popular languages already have Mal implementations.
45However, this should not discourage you from creating your own
46implementation in a language that already has one. However, if you go
47this route, I suggest you avoid referring to the existing
48implementations (i.e. "cheating") to maximize your learning experience
49instead of just borrowing mine. On the other hand, if your goal is to
50add new implementations to mal as efficiently as possible, then you
51SHOULD find the most similar target language implementation and refer
52to it frequently.
53
bf518367
JM
54If you want a fairly long list of programming languages with an
55approximate measure of popularity, try the [Programming Language
56Popularity Chart](http://langpop.corger.nl/)
57
0f4ca9d1
JM
58
59## Getting started
60
61* Install your chosen language interpreter/compiler, language package
62 manager and build tools (if applicable)
63
64* Fork the mal repository on github and then clone your forked
65 repository:
66```
67git clone git@github.com:YOUR_NAME/mal.git
68cd mal
69```
70
fc03712f
EC
71* Make a new directory for your implementation. For example, if your
72language is called "quux":
0f4ca9d1
JM
73```
74mkdir quux
75```
76
77* Modify the top level Makefile to allow the tests to be run against
78 your implementation. For example, if your language is named "quux"
79 and uses "qx" as the file extension, then make the following
80 3 modifications to Makefile:
81```
82IMPLS = ... quux ...
83...
84quux_STEP_TO_PROG = mylang/$($(1)).qx
85...
86quux_RUNSTEP = ../$(2) $(3)
87```
88
89This allows you to run tests against your implementation like this:
90```
91make test^quux^stepX
92```
93
94
95## General hints
96
97Stackoverflow and Google are your best friends. Modern polyglot
98developers do not memorize dozens of programming languages. Instead,
99they learn the peculiar terminology used with each language and then
100use this to search for their answers.
101
102Here are some other resources where multiple languages are
103compared/described:
104* http://learnxinyminutes.com/
105* http://hyperpolyglot.org/
106* http://rosettacode.org/
334a71b6 107* http://rigaux.org/language-study/syntax-across-languages/
0f4ca9d1
JM
108
109Do not let yourself be bogged down by specific problems. While the
110make-a-lisp process is structured as a series of steps, the reality is
111that building a lisp interpreter is more like a branching tree. If you
112get stuck on tail call optimization, or hash-maps, move on to other
113things. You will often have a stroke of inspiration for a problem as
114you work through other functionality. I have tried to structure this
45a8b3ca
JM
115guide and the tests to make clear which things can be deferred until
116later.
117
118An aside on deferrable/optional bits: when you run the tests for
119a given step, the last tests are often marked with an "optional"
120header. This indicates that these are tests for functionality that is
121not critical to finish a basic mal implementation. Many of the steps
122in this process guide have a "Deferrable" section, however, it is not
123quite the same meaning. Those sections include the functionality that
124is marked as optional in the tests, but they also include
0f4ca9d1
JM
125functionality that becomes mandatory at a later step. In other words,
126this is a "make your own Lisp adventure".
127
128Use test driven development. Each step of the make-a-lisp process has
129a bunch of tests associated with it and there is an easy script to run
130all the tests for a specific step in the process. Pick a failing test,
131fix it, repeat until all the tests for that step pass.
132
133The `process` directory contains abbreviated pseudocode and
134architecture images for each step of the make-a-lisp process. Use
135a textual diff/comparison tool to compare the previous pseudocode step
136with the one you are working on. The architecture images have changes
137from the previous step highlighted in red.
138
daa1cf3f
JM
139If you get completely stuck and are feeling like giving up, then you
140should "cheat" by referring to the same step or functionality in
141a existing implementation language. You are here to learn, not to take
142a test, so do not feel bad about it. Okay, you should feel a little
143bit bad about it.
0f4ca9d1
JM
144
145
146## The Make-A-Lisp Process
147
148In the steps that follow the name of the target language is "quux" and
149the file extension for that language is "qx".
150
151
152<a name="step0"></a>
153
154### Step 0: The REPL
155
156![step0_repl architecture](step0_repl.png)
157
158This step is basically just creating a skeleton of your interpreter.
159
160* Create a `step0_repl.qx` file in `quux/`.
161
162* Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep`
163 (read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just
164 stubs that return their first parameter (a string if your target
165 language is a statically typed) and `rep` calls them in order
166 passing the return to the input of the next.
167
58ba5af4
JM
168* Add a main loop that repeatedly prints a prompt (needs to be
169 "user> " for later tests to pass), gets a line of input from the
170 user, calls `rep` with that line of input, and then prints out the
171 result from `rep`. It should also exit when you send it an EOF
172 (often Ctrl-D).
0f4ca9d1
JM
173
174* If you are using a compiled (ahead-of-time rather than just-in-time)
175 language, then create a Makefile (or appropriate project definition
176 file) in your directory.
177
daa1cf3f
JM
178It is time to run your first tests. This will check that your program
179does input and output in a way that can be captured by the test
180harness. Go to the top level and run the following:
181```
182make test^quux^step0
183```
0f4ca9d1
JM
184
185Add and then commit your new `step0_repl.qx` and `Makefile` to git.
186
187Congratulations! You have just completed the first step of the
188make-a-lisp process.
189
190
bd62ff74 191#### Optional:
0f4ca9d1
JM
192
193* Add full line editing and command history support to your
194 interpreter REPL. Many languages have a library/module that provide
195 line editing support. Another option if your language supports it is
196 to use an FFI (foreign function interface) to load and call directly
197 into GNU readline, editline, or libnoise library. Add line
198 editing interface code to `readline.qx`
199
200
201<a name="step1"></a>
202
203### Step 1: Read and Print
204
205![step1_read_print architecture](step1_read_print.png)
206
207In this step, your interpreter will "read" the string from the user
208and parse it into an internal tree data structure (an abstract syntax
209tree) and then take that data structure and "print" it back to
210a string.
211
212In non-lisp languages, this step (called "lexing and parsing") can be
213one of the most complicated parts of the compiler/interpreter. In
214Lisp, the data structure that you want in memory is basically
215represented directly in the code that the programmer writes
216(homoiconicity).
217
218For example, if the string is "(+ 2 (* 3 4))" then the read function
219will process this into a tree structure that looks like this:
220```
221 List
222 / | \
223 / | \
224 / | \
225 Sym:+ Int:2 List
226 / | \
227 / | \
228 / | \
229 Sym:* Int:3 Int:4
230```
231
232Each left paren and its matching right paren (lisp "sexpr") becomes
233a node in the tree and everything else becomes a leaf in the tree.
234
235If you can find code for an implementation of a JSON encoder/decoder
236in your target language then you can probably just borrow and modify
237that and be 75% of the way done with this step.
238
239The rest of this section is going to assume that you are not starting
240from an existing JSON encoder/decoder, but that you do have access to
241a Perl compatible regular expressions (PCRE) module/library. You can
242certainly implement the reader using simple string operations, but it
243is more involved. The `make`, `ps` (postscript) and Haskell
244implementations have examples of a reader/parser without using regular
245expression support.
246
247* Copy `step0_repl.qx` to `step1_read_print.qx`.
248
249* Add a `reader.qx` file to hold functions related to the reader.
250
251* If the target language has objects types (OOP), then the next step
252 is to create a simple stateful Reader object in `reader.qx`. This
253 object will store the tokens and a position. The Reader object will
254 have two methods: `next` and `peek`. `next` returns the tokens at
255 the current position and increments the position. `peek` just
256 returns the token at the current position.
257
258* Add a function `read_str` in `reader.qx`. This function
259 will call `tokenizer` and then create a new Reader object instance
260 with the tokens. Then it will call `read_form` with the Reader
261 instance.
262
263* Add a function `tokenizer` in `reader.qx`. This function will take
2315fd53 264 a single string and return an array/list
0f4ca9d1
JM
265 of all the tokens (strings) in it. The following regular expression
266 (PCRE) will match all mal tokens.
267```
268[\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"|;.*|[^\s\[\]{}('"`,;)]*)
269```
270
271* Add the function `read_form` to `reader.qx`. This function
272 will peek at the first token in the Reader object and switch on the
273 first character of that token. If the character is a left paren then
274 `read_list` is called with the Reader object. Otherwise, `read_atom`
275 is called with the Reader Object. The return value from `read_form`
276 is a mal data type. If your target language is statically typed then
277 you will need some way for `read_form` to return a variant or
278 subclass type. For example, if your language is object oriented,
2315fd53 279 then you can define a top level MalType (in `types.qx`) that all
0f4ca9d1
JM
280 your mal data types inherit from. The MalList type (which also
281 inherits from MalType) will contains a list/array of other MalTypes.
282 If your language is dynamically typed then you can likely just
283 return a plain list/array of other mal types.
284
285* Add the function `read_list` to `reader.qx`. This function will
286 repeatedly call `read_form` with the Reader object until it
287 encounters a ')' token (if it reach EOF before reading a ')' then
288 that is an error). It accumulates the results into a List type. If
289 your language does not have a sequential data type that can hold mal
290 type values you may need to implement one (in `types.qx`). Note
291 that `read_list` repeatedly calls `read_form` rather than
292 `read_atom`. This mutually recursive defintion between `read_list`
293 and `read_form` is what allows lists to contain lists.
294
295* Add the function `read_atom` to `reader.qx`. This function will
296 look at the contents of the token and return the appropriate scalar
297 (simple/single) data type value. Initially, you can just implement
298 numbers (integers) and symbols . This will allow you to proceed
299 through the next couple of steps before you will need to implement
300 the other fundamental mal types: nil, true, false, and string. The
301 remaining mal types: keyword, vector, hash-map, and atom do not
302 need to be implemented until step 9 (but can be implemented at any
303 point between this step and that). BTW, symbols types are just an
304 object that contains a single string name value (some languages have
305 symbol types already).
306
307* Add a file `printer.qx`. This file will contain a single function
308 `pr_str` which does the opposite of `read_str`: take a mal data
309 structure and return a string representation of it. But `pr_str` is
310 much simpler and is basically just a switch statement on the type of
311 the input object:
312
313 * symbol: return the string name of the symbol
314 * number: return the number as a string
315 * list: iterate through each element of the list calling `pr_str` on
316 it, then join the results with a space separator, and surround the
317 final result with parens
318
319* Change the `READ` function in `step1_read_print.qx` to call
320 `reader.read_str` and the `PRINT` function to call `printer.pr_str`.
321 `EVAL` continues to simply return its input but the type is now
322 a mal data type.
323
324You now have enough hooked up to begin testing your code. You can
325manually try some simple inputs:
326 * `123` -> `123`
327 * ` 123 ` -> `123`
328 * `abc` -> `abc`
329 * ` abc ` -> `abc`
330 * `(123 456)` -> `(123 456)`
331 * `( 123 456 789 ) ` -> `(123 456 789)`
6a767b0d 332 * `( + 2 (* 3 4) ) ` -> `(+ 2 (* 3 4))`
0f4ca9d1
JM
333
334To verify that your code is doing more than just eliminating extra
335spaces (and not failing), you can instrument your `reader.qx` functions.
336
337Once you have gotten past those simple manual tests, it is time to run
338the full suite of step 1 tests. Go to the top level and run the
339following:
340```
341make test^quux^step1
342```
343
344Fix any test failures related to symbols, numbers and lists.
345
346Depending on the functionality of your target language, it is likely
347that you have now just completed one of the most difficult steps. It
348is down hill from here. The remaining steps will probably be easier
349and each step will give progressively more bang for the buck.
350
45a8b3ca 351#### Deferrable:
0f4ca9d1
JM
352
353
354* Add error checking to your reader functions to make sure parens
355 are properly matched. Catch and print these errors in your main
356 loop. If your language does not have try/catch style bubble up
357 exception handling, then you will need to add explicit error
358 handling to your code to catch and pass on errors without crashing.
359
360* Add support for the other basic data type to your reader and printer
361 functions: string, nil, true, and false. These become mandatory at
362 step 4. When a string is read, a slash followed by a doublequote is
943f21c4 363 translated into a plain doublequote character and a slash followed by
0f4ca9d1
JM
364 "n" is translated into a newline. To properly print a string (for
365 step 4 string functions), the `pr_str` function needs another
366 parameter called `print_readably`. When `print_readably` is true,
367 doublequotes and newlines are translated into their printed
368 representations (the reverse of the reader). The `PRINT` function in
369 the main program should call `pr_str` with print_readably set to
370 true.
371
372* Add support for the other mal types: keyword, vector, hash-map, and
2345a3da
JM
373 atom.
374 * keyword: a keyword is a token that begins with a colon. A keyword
375 can just be stored as a string with special unicode prefix like
376 0x29E (or char 0xff/127 if the target language does not have good
377 unicode support) and the printer translates strings with that
378 prefix back to the keyword representation. This makes it easy to
379 use keywords as hash map keys in most languages. You can also
380 store keywords as a unique data type, but you will need to make
381 sure they can be used as hash map keys (which may involve doing
382 a similar prefixed translation anyways).
383 * vector: a vector can be implemented with same underlying
384 type as a list as long as there is some mechanism to keep track of
385 the difference. You can use the same reader function for both
386 lists and vectors by adding parameters for the starting and ending
387 tokens.
388 * hash-map: a hash-map is an associative data structure that maps
389 strings to other mal values. If you implement keywords as prefixed
390 strings, then you only need a native associative data structure
391 which supports string keys. Clojure allows any value to be a hash
392 map key, but the base functionality in mal is to support strings
393 and keyword keys. Because of the representation of hash-maps as
394 an alternating sequence of keys and values, you can probably use
395 the same reader function for hash-maps as lists and vectors with
396 parameters to indicate the starting and ending tokens. The odd
397 tokens are then used for keys with the corresponding even tokens
398 as the values.
399
400* Add support for reader macros which are forms that are
401 transformed into other forms during the read phase. Refer to
402 `tests/step1_read_print.mal` for the form that these macros should
403 take (they are just simple transformations of the token stream).
0f4ca9d1
JM
404
405* Add comment support to your reader. The tokenizer should ignore
406 tokens that start with ";". Your `read_str` function will need to
407 properly handle when the tokenizer returns no values. The simplest
408 way to do this is to return `nil` mal value. A cleaner option (that
409 does not print `nil` at the prompt is to throw a special exception
410 that causes the main loop to simply continue at the beginning of the
411 loop without calling `rep`.
412
413
414<a name="step2"></a>
415
416### Step 2: Eval
417
418![step2_eval architecture](step2_eval.png)
419
420In step 1 your mal interpreter was basically just a way to validate
421input and eliminate extraneous white space. In this step you will turn
422your interpreter into a simple number calculator by adding
423functionality to the evaluator (`EVAL`).
424
425Compare the pseudocode for step 1 and step 2 to get a basic idea of
426the changes that will be made during this step:
427```
428diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt
429```
430
431* Copy `step1_read_print.qx` to `step2_eval.qx`.
432
433* Define a simple initial REPL environment. This environment is an
434 associative structure that maps symbols (or symbol names) to
435 numeric functions. For example, in python this would look something
436 like this:
437```
438repl_env = {'+': lambda a,b: a+b,
439 '-': lambda a,b: a-b,
440 '*': lambda a,b: a*b,
441 '/': lambda a,b: int(a/b)}
442```
443
444* Modify the `rep` function to pass the REPL environment as the second
445 parameter for the `EVAL` call.
446
447* Create a new function `eval_ast` which takes `ast` (mal data type)
448 and an associative structure (the environment from above).
449 `eval_ast` switches on the type of `ast` as follows:
450
451 * symbol: lookup the symbol in the environment structure and return
452 the value or raise an error no value is found
453 * list: return a new list that is the result of calling `EVAL` on
454 each of the members of the list
455 * otherwise just return the original `ast` value
456
457* Modify `EVAL` to check if the first parameter `ast` is a list.
458 * `ast` is not a list: then return the result of calling `eval_ast`
459 on it.
460 * `ast` is a list: call `eval_ast` to get a new evaluated list. Take
461 the first item of the evaluated list and call it as function using
462 the rest of the evaluated list as its arguments.
463
464If your target language does not have full variable length argument
465support (e.g. variadic, vararg, splats, apply) then you will need to
466pass the full list of arguments as a single parameter and split apart
467the individual values inside of every mal function. This is annoying,
468but workable.
469
470The process of taking a list and invoking or executing it to return
471something new is known in Lisp as the "apply" phase.
472
473Try some simple expressions:
474
475 * `(+ 2 3)` -> `5`
476 * `(+ 2 (* 3 4))` -> `14`
477
478The most likely challenge you will encounter is how to properly call
479a function references using an arguments list.
480
481Now go to the top level, run the step 2 tests and fix the errors.
482```
483make test^quux^step2
484```
485
486You now have a simple prefix notation calculator!
487
488
489<a name="step3"></a>
490
491### Step 3: Environments
492
493![step3_env architecture](step3_env.png)
494
495In step 2 you were already introduced to REPL environment (`repl_env`)
496where the basic numeric functions were stored and looked up. In this
497step you will add the ability to create new environments (`let*`) and
daa1cf3f 498modify existing environments (`def!`).
0f4ca9d1
JM
499
500A Lisp environment is an associative data structure that maps symbols (the
501keys) to values. But Lisp environments have an additional important
502function: they can refer to another environment (the outer
503environment). During environment lookups, if the current environment
504does not have the symbol, the lookup continues in the outer
505environment, and continues this way until the symbol is either found,
506or the outer environment is `nil` (the outermost environment in the
507chain).
508
509Compare the pseudocode for step 2 and step 3 to get a basic idea of
510the changes that will be made during this step:
511```
512diff -urp ../process/step2_eval.txt ../process/step3_env.txt
513```
514
d9c020b0 515* Copy `step2_eval.qx` to `step3_env.qx`.
0f4ca9d1
JM
516
517* Create `env.qx` to hold the environment definition.
518
519* Define an `Env` object that is instantiated with a single `outer`
520 parameter and starts with an empty associative data structure
521 property `data`.
522
523* Define three methods for the Env object:
524 * set: takes a symbol key and a mal value and adds to the `data`
525 structure
526 * find: takes a symbol key and if the current environment contains
527 that key then return the environment. If no key is found and outer
528 is not `nil` then call find (recurse) on the outer environment.
529 * get: takes a symbol key and uses the `find` method to locate the
530 environment with the key, then returns the matching value. If no
531 key is found up the outer chain, then throws/raises a "not found"
532 error.
533
169ddeb2 534* Update `step3_env.qx` to use the new `Env` type to create the
0f4ca9d1
JM
535 repl_env (with a `nil` outer value) and use the `set` method to add
536 the numeric functions.
537
538* Modify `eval_ast` to call the `get` method on the `env` parameter.
539
540* Modify the apply section of `EVAL` to switch on the first element of
541 the list:
542 * symbol "def!": call the set method of the current environment
543 (second parameter of `EVAL` called `env`) using the unevaluated
544 first parameter (second list element) as the symbol key and the
545 evaluated second parameter as the value.
546 * symbol "let*": create a new environment using the current
547 environment as the outer value and then use the first parameter as
0a63c9f9 548 a list of new bindings in the "let*" environment. Take the second
0f4ca9d1
JM
549 element of the binding list, call `EVAL` using the new "let*"
550 environment as the evaluation environment, then call `set` on the
0a63c9f9 551 "let*" environment using the first binding list element as the key
0f4ca9d1
JM
552 and the evaluated second element as the value. This is repeated
553 for each odd/even pair in the binding list. Note in particular,
554 the bindings earlier in the list can be referred to by later
555 bindings. Finally, the second parameter (third element) of the
556 original `let*` form is evaluated using the new "let*" environment
557 and the result is returned as the result of the `let*` (the new
558 let environment is discarded upon completion).
559 * otherwise: call `eval_ast` on the list and apply the first element
560 to the rest as before.
561
562`def!` and `let*` are Lisp "specials" (or "special atoms") which means
563that they are language level features and more specifically that the
564rest of the list elements (arguments) may be evaluated differently (or
565not at all) unlike the default apply case where all elements of the
566list are evaluated before the first element is invoked. Lists which
567contain a "special" as the first element are known as "special forms".
568The are special because the follow special evaluation rules.
569
570Try some simple environment tests:
571
572 * `(def! a 6)` -> `6`
573 * `a` -> `6`
574 * `(def! b (+ a 2))` -> `8`
575 * `(+ a b)` -> `14`
576 * `(let* (c 2) c)` -> `2`
577
578Now go to the top level, run the step 3 tests and fix the errors.
579```
580make test^quux^step3
581```
582
583You mal implementation is still basically just a numeric calculator
584with save/restore capability. But you have set the foundation for step
5854 where it will begin to feel like a real programming language.
586
587
588An aside on mutation and typing:
589
590The "!" suffix on symbols is used to indicate that this symbol refers
591to a function that mutates something else. In this case, the `def!`
592symbol indicates a special form that will mutate the current
593environment. Many (maybe even most) of runtime problems that are
594encountered in software engineering are a result of mutation. By
595clearly marking code where mutation may occur, you can more easily
596track down the likely cause of runtime problems when they do occur.
597
598Another cause of runtime errors is type errors, where a value of one
599type is unexpectedly treated by the program as a different and
600incompatible type. Statically typed languages try to make the
601programmer solve all type problems before the program is allowed to
602run. Most Lisp variants tend to be dynamically typed (types of values
603are checked when they are actually used at runtime).
604
605As an aside-aside: The great debate between static and dynamic typing
606debate can be understood by following the money. Advocates of strict
607static typing use words like "correctness" and "safety" and thus get
608government and academic funding. Advocates of dynamic typing use words
609like "agile" and "time-to-market" and thus get venture capital and
610commercial funding.
611
612
613<a name="step4"></a>
614
615### Step 4: If Fn Do
616
617![step4_if_fn_do architecture](step4_if_fn_do.png)
618
619In step 3 you added environments and the special forms for
620manipulating environments. In this step you will add 3 new special
621forms (`if`, `fn*` and `do`) and add several more core functions to
622the default REPL environment. Our new architecture will look like
623this:
624
625The `fn*` special form is how new user-defined functions are created.
626In some Lisps, this special form is named "lambda".
627
628Compare the pseudocode for step 3 and step 4 to get a basic idea of
629the changes that will be made during this step:
630```
631diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt
632```
633
634* Copy `step3_env.qx` to `step4_if_fn_do.qx`.
635
636* If you have not implemented reader and printer support (and data
637 types) for `nil`, `true` and `false`, you will need to do so for
638 this step.
639
640* Update the constructor/initializer for environments to take two new
641 arguments: `binds` and `exprs`. Bind (`set`) each element (symbol)
642 of the binds list to the respective element of the `exprs` list.
643
644* Add support to `printer.qx` to print functions values. A string
645 literal like "#<function>" is sufficient.
646
1af1f7f1 647* Add the following special forms to `EVAL`:
0f4ca9d1 648
1af1f7f1 649 * `do`: Evaluate all the elements of the list using `eval_ast`
f558a0c8 650 and return the final evaluated element.
0f4ca9d1
JM
651 * `if`: Evaluate the first parameter (second element). If the result
652 (condition) is anything other than `nil` or `false`, then evaluate
1af1f7f1 653 the second parameter (third element of the list) and return the
0f4ca9d1
JM
654 result. Otherwise, evaluate the third parameter (fourth element)
655 and return the result. If condition is false and there is no third
656 parameter, then just return `nil`.
657 * `fn*`: Return a new function closure. The body of that closure
658 does the following:
659 * Create a new environment using `env` (closed over from outer
660 scope) as the `outer` parameter, the first parameter (second
661 list element of `ast` from the outer scope) as the `binds`
662 parameter, and the parameters to the closure as the `exprs`
663 parameter.
664 * Call `EVAL` on the second parameter (third list element of `ast`
665 from outer scope), using the new environment. Use the result as
666 the return value of the closure.
667
668If your target language does not support closures, then you will need
669to implement `fn*` using some sort of structure or object that stores
670the values being closed over: the first and second elements of the
671`ast` list (function parameter list and function body) and the current
672environment `env`. In this case, your native functions will need to be
673wrapped in the same way. You will probably also need a method/function
674that invokes your function object/structure for the default case of
675the apply section of `EVAL`.
676
677Try out the basic functionality you have implemented:
678
679 * `(fn* [a] a)` -> `#<function>`
680 * `( (fn* [a] a) 7)` -> `7`
681 * `( (fn* [a] (+ a 1)) 10)` -> `11`
682 * `( (fn* [a b] (+ a b)) 2 3)` -> `5`
683
684* Add a new file `core.qx` and define an associative data structure
685 `ns` (namespace) that maps symbols to functions. Move the numeric
686 function definitions into this structure.
687
688* Modify `step4_if_fn_do.qx` to iterate through the `core.ns`
689 structure and add (`set`) each symbol/function mapping to the
690 REPL environment (`repl_env`).
691
692* Add the following functions to `core.ns`:
693 * `list`: take the parameters and return them as a list.
694 * `list?`: return true if the first parameter is a list, false
695 otherwise.
696 * `empty?`: treat the first parameter as a list and return true if
697 the list is empty and false if it contains any elements.
698 * `count`: treat the first parameter as a list and return the number
699 of elements that it contains.
700 * `=`: compare the first two parameters and return true if they are
701 the same type and contain the same value. In the case of equal
702 length lists, each element of the list should be compared for
703 equality and if they are the same return true, otherwise false.
704 * `<`, `<=`, `>`, and `>=`: treat the first two parameters as
705 numbers and do the corresponding numeric comparison, returning
706 either true or false.
707
708Now go to the top level, run the step 4 tests. There are a lot of
709tests in step 4 but all of the non-optional tests that do not involve
710strings should be able to pass now.
711
712```
713make test^quux^step4
714```
715
716Your mal implementation is already beginning to look like a real
717language. You have flow control, conditionals, user-defined functions
718with lexical scope, side-effects (if you implement the string
1380d8c1
JM
719functions), etc. However, our little interpreter has not quite reached
720Lisp-ness yet. The next several steps will take your implementation
721from a neat toy to a full featured language.
0f4ca9d1 722
45a8b3ca 723#### Deferrable:
0f4ca9d1
JM
724
725* Implement Clojure-style variadic function parameters. Modify the
726 constructor/initializer for environments, so that if a "&" symbol is
727 encountered in the `binds` list, the next symbol in the `binds` list
728 after the "&" is bound to the rest of the `exprs` list that has not
729 been bound yet.
730
731* Defines a `not` function using mal itself. In `step4_if_fn_do.qx`
732 call the `rep` function with this string:
733 "(def! not (fn* (a) (if a false true)))".
734
735* Implement the strings functions in `core.qx`. To implement these
736 functions, you will need to implement the string support in the
45a8b3ca 737 reader and printer (deferrable section of step 1). Each of the string
0f4ca9d1
JM
738 functions takes multiple mal values, prints them (`pr_str`) and
739 joins them together into a new string.
740 * `pr-str`: calls `pr_str` on each argument with `print_readably`
741 set to true, joins the results with " " and returns the new
742 string.
743 * `str`: calls `pr_str` on each argument with `print_readably` set
744 to false, concatenates the results together ("" separator), and
745 returns the new string.
746 * `prn`: calls `pr_str` on each argument with `print_readably` set
747 to true, joins the results with " ", prints the string to the
748 screen and then returns `nil`.
749 * `println`: calls `pr_str` on each argument with `print_readably` set
750 to false, joins the results with " ", prints the string to the
751 screen and then returns `nil`.
752
753
754<a name="step5"></a>
755
756### Step 5: Tail call optimization
757
758![step5_tco architecture](step5_tco.png)
759
760In step 4 you added special forms `do`, `if` and `fn*` and you defined
761some core functions. In this step you will add a Lisp feature called
762tail call optimization (TCO). Also called "tail recursion" or
763sometimes just "tail calls".
764
765Several of the special forms that you have defined in `EVAL` end up
766calling back into `EVAL`. For those forms that call `EVAL` as the last
767thing that they do before returning (tail call) you will just loop back
768to the beginning of eval rather than calling it again. The advantage
769of this approach is that it avoids adding more frames to the call
770stack. This is especially important in Lisp languages because they do
771not tend to have iteration control structures preferring recursion
772instead. However, with tail call optimization, recursion can be made
773as stack efficient as iteration.
774
775Compare the pseudocode for step 4 and step 5 to get a basic idea of
776the changes that will be made during this step:
777```
778diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt
779```
780
781* Copy `step4_env.qx` to `step5_tco.qx`.
782
783* Add a loop (e.g. while true) around all code in `EVAL`.
784
785* Modify each of the following form cases to add tail call recursion
786 support:
787 * `let*`: remove the final `EVAL` call on the second `ast` argument
788 (third list element). Set `env` (i.e. the local variable passed in
789 as second parameter of `EVAL`) to the new let environment. Set
790 `ast` (i.e. the local variable passed in as first parameter of
791 `EVAL`) to be the second `ast` argument. Continue at the beginning
792 of the loop (no return).
793 * `do`: change the `eval_ast` call to evaluate all the parameters
4e7296f9
JM
794 except for the last (2nd list element up to but not including
795 last). Set `ast` to the last element of `ast`. Continue
0f4ca9d1
JM
796 at the beginning of the loop (`env` stays unchanged).
797 * `if`: the condition continues to be evaluated, however, rather
798 than evaluating the true or false branch, `ast` is set to the
799 unevaluated value of the chosen branch. Continue at the beginning
800 of the loop (`env` is unchanged).
801
802* The return value from the `fn*` special form will now become an
803 object/structure with attributes that allow the default invoke case
804 of `EVAL` to do TCO on mal functions. Those attributes are:
0f4ca9d1
JM
805 * `ast`: the second `ast` argument (third list element) representing
806 the body of the function.
807 * `params`: the first `ast` argument (second list element)
808 representing the parameter names of the function.
809 * `env`: the current value of the `env` parameter of `EVAL`.
4e7296f9
JM
810 * `fn`: the original function value (i.e. what was return by `fn*`
811 in step 4). Note that this is deferrable until step 9 when it is
812 needed for the `map` and `apply` core functions).
0f4ca9d1
JM
813
814* The default "apply"/invoke case of `EVAL` must now be changed to
815 account for the new object/structure returned by the `fn*` form.
816 Continue to call `eval_ast` on `ast`. The first element is `f`.
817 Switch on the type of `f`:
818 * regular function (not one defined by `fn*`): apply/invoke it as
4e7296f9 819 before (in step 4).
0f4ca9d1
JM
820 * a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate
821 a new environment using the `env` and `params` attributes of `f`
822 as the `outer` and `binds` arguments and rest `ast` arguments
823 (list elements 2 through the end) as the `exprs` argument. Set
824 `env` to the new environment. Continue at the beginning of the loop.
825
826Run some manual tests from previous steps to make sure you have not
827broken anything by adding TCO.
828
829Now go to the top level, run the step 5 tests.
830
831```
832make test^quux^step5
833```
834
835Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to`
836function cannot be tail call optimized because it does something after
837the recursive call (`sum-to` calls itself and then does the addition).
838Lispers say that the `sum-to` is not in tail position. The `sum2`
839function however, calls itself from tail position. In other words, the
840recursive call to `sum2` is the last action that `sum2` does. Calling
841`sum-to` with a large value will cause a stack overflow exception in
842most target languages (some have super-special tricks they use to
843avoid stack overflows).
844
845Congratulations, your mal implementation already has a feature (TCO)
846that most mainstream languages lack.
847
848
849<a name="step6"></a>
850
851### Step 6: Files and Evil
852
853![step6_file architecture](step6_file.png)
854
855In step 5 you added tail call optimization. In this step you will add
856some string and file operations and give your implementation a touch
857of evil ... er, eval. And as long as your language supports function
858closures, this step will be quite simple. However, to complete this
859step, you must implement string type support, so if you have been
860holding off on that you will need to go back and do so.
861
862Compare the pseudocode for step 5 and step 6 to get a basic idea of
863the changes that will be made during this step:
864```
865diff -urp ../process/step5_tco.txt ../process/step6_file.txt
866```
867
ffd31966
JM
868* Copy `step5_tco.qx` to `step6_file.qx`.
869
0f4ca9d1
JM
870* Add two new string functions to the core namespaces:
871 * `read-string`: this function just exposes the `read_str` function
872 from the reader. If your mal string type is not the same as your
873 target language (e.g. statically typed language) then your
874 `read-string` function will need to unbox (extract) the raw string
875 from the mal string type in order to call `read_str`.
876 * `slurp`: this function takes a file name (string) and returns the
877 contents of the file as a string. Once again, if your mal string
878 type wraps a raw target language string, then you will need to
879 unmarshall (extract) the string parameter to get the raw file name
880 string and marshall (wrap) the result back to a mal string type.
881
6fef8e58
JM
882* In your main program, add a new symbol "eval" to your REPL
883 environment. The value of this new entry is a function that takes
884 a single argument `ast`. The closure calls the your `EVAL` function
885 using the `ast` as the first argument and the REPL environment
886 (closed over from outside) as the second argument. The result of
887 the `EVAL` call is returned. This simple but powerful addition
888 allows your program to treat mal data as a mal program. For example,
889 you can now to this:
890```
891(def! mal-prog (list + 1 2))
892(eval mal-prog)
893```
0f4ca9d1
JM
894
895* Define a `load-file` function using mal itself. In your main
896 program call the `rep` function with this string:
897 "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \")\")))))".
898
899Try out `load-file`:
900 * `(load-file "../tests/incA.mal")` -> `9`
901 * `(inc4 3)` -> `7`
902
903The `load-file` function does the following:
904 * Call `slurp` to read in a file by name. Surround the contents with
905 "(do ...)" so that the whole file will be treated as a single
906 program AST (abstract syntax tree).
907 * Call `read-string` on the string returned from `slurp`. This uses
908 the reader to read/convert the file contents into mal data/AST.
909 * Call `eval` (the one in the REPL environment) on the AST returned
910 from `read-string` to "run" it.
911
912Now go to the top level, run the step 6 tests. The optional tests will
913need support from the reader for comments, vectors and hash-maps:
914```
915make test^quux^step6
916```
917
918Congratulations, you now have a full-fledged scripting language that
6fef8e58
JM
919can run other mal programs. The `slurp` function loads a file as
920a string, the `read-string` function calls the mal reader to turn that
921stirng into data, and the `eval` function takes data and evaluates it
922as a normal mal program. However, it is important to note that the
923`eval` function is not just for running external programs. Because mal
924programs are regular mal data structures, you can dynamically generate
925or manipulate those data structures before calling `eval` on them.
926This isomorphisism (same shape) between data and programs is known as
927"homoiconicity". Lisp languages are homoiconic and this property
928distinguishes them from most other programming languages.
929
930You mal implementation is quite powerful already but the set of
931functions that are available (from `core.qx`) is fairly limited. The
932bulk of the functions you will add are described in step 9 and step A,
933but you will begin to flesh them out over the next few steps to
934support quoting (step 7) and macros (step 8).
0f4ca9d1
JM
935
936
45a8b3ca 937#### Deferrable:
0f4ca9d1
JM
938
939* Add the ability to run another mal program from the command line.
940 Prior to the REPL loop, check if your mal implementation is called
941 with command line arguments. If so, treat the first argument as
942 a filename and use `rep` to call `load-file` on that filename, and
943 finally exit/terminate execution.
944
945* Add the rest of the command line arguments to your REPL environment
946 so that programs that are run with `load-file` have access to their
6fef8e58 947 calling environmnet. Add a new "\*ARGV\*" (symbol) entry to your REPL
0f4ca9d1
JM
948 environment. The value of this entry should be the rest of the
949 command line arguments as a mal list value.
950
951
952<a name="step7"></a>
953
954### Step 7: Quoting
955
956![step7_quote architecture](step7_quote.png)
957
958In step 7 you will add the special forms `quote` and `quasiquote` and
959add supporting core functions `cons` and `concat`. The two quote forms
960add a powerful abstraction for manipulating mal code itself
961(meta-programming).
962
0f4ca9d1
JM
963The `quote` special form indicates to the evaluator (`EVAL`) that the
964parameter should not be evaluated (yet). At first glance, this might
965not seem particular useful but an example of what this enables is the
966ability for a mal program to refer to a symbol itself rather than the
967value that it evaluates to. Likewise with lists. For example, consider
968the following:
969
970* `(prn abc)`: this will lookup the symbol `abc` in the current
971 evaluation environment and print it. This will result in error if
972 `abc` is not defined.
973* `(prn (quote abc))`: this will print "abc" (prints the symbol
974 itself). This will work regardless of whether `abc` is defined in
975 the current environment.
976* `(prn (1 2 3))`: this will result in an error because `1` is not
977 a function and cannot be applied to the arguments `(2 3)`.
978* `(prn (quote (1 2 3)))`: this will print "(1 2 3)".
979* `(def! l (quote (1 2 3)))`: list quoting allows us to define lists
980 directly in the code (list literal). Another way of doing this is
981 with the list function: `(def! l (list 1 2 3))`.
982
983The second special quoting form is `quasiquote`. This allows a quoted
984list to have internal elements of the list that are temporarily
985unquoted (normal evaluation). There are two special forms that only
986mean something within a quasiquoted list: `unquote` and
987`splice-unquote`. These are perhaps best explained with some examples:
988
989* `(def! lst (quote (2 3)))` -> `(2 3)`
990* `(quasiquote (1 (unquote lst)))` -> `(1 (2 3))`
991* `(quasiquote (1 (splice-unquote lst)))` -> `(1 2 3)`
992
993The `unquote` form turns evaluation back on for its argument and the
994result of evaluation is put in place into the quasiquoted list. The
995`splice-unquote` also turns evaluation back on for its argument, but
996the evaluated value must be a list which is then "spliced" into the
997quasiquoted list. The true power of the quasiquote form will be
998manifest when it used together with macros (in the next step).
999
ffd31966
JM
1000Compare the pseudocode for step 6 and step 7 to get a basic idea of
1001the changes that will be made during this step:
1002```
1003diff -urp ../process/step6_file.txt ../process/step7_quote.txt
1004```
1005
1006* Copy `step6_file.qx` to `step7_quote.qx`.
1007
0f4ca9d1
JM
1008* Before implementing the quoting forms, you will need to implement
1009* some supporting functions in the core namespace:
1010 * `cons`: this function takes a list as its second
1011 parameter and returns a new list that has the first argument
1012 prepended to it.
1013 * `concat`: this functions takes 0 or more lists as
1014 parameters and returns a new list that is a concatenation of all
1015 the list parameters.
1016
1017An aside on immutability: note that neither cons or concat mutate
1018their original list arguments. Any references to them (i.e. other
1019lists that they may be "contained" in) will still refer to the
1020original unchanged value. Mal, like Clojure, is a language which uses
1021immutable data structures. I encourage you to read about the power and
1022importance of immutability as implemented in Clojure (from which
1023Mal borrows most of its syntax and feature-set).
1024
1025* Add the `quote` special form. This form just returns its argument
1026 (the second list element of `ast`).
1027
1028* Add the `quasiquote` special form. First implement a helper function
1029 `is_pair` that returns true if the parameter is a non-empty list.
1030 Then define a `quasiquote` function. This is called from `EVAL` with
1031 the first `ast` argument (second list element) and then `ast` is set
1032 to the result and execution continues at the top of the loop (TCO).
1033 The `quasiquote` function takes a parameter `ast` and has the
1034 following conditional:
1035 1. if `is_pair` of `ast` is false: return a new list containing:
1036 a symbol named "quote" and `ast`.
1037 2. else if the first element of `ast` is a symbol named "unquote":
1038 return the second element of `ast`.
1039 3. if `is_pair` of first element of `ast` is true and the first
1040 element of first element of `ast` (`ast[0][0]`) is a symbol named
1041 "splice-unquote": return a new list containing: a symbol named
1042 "concat", the second element of first element of `ast`
1043 (`ast[0][1]`), and the result of calling `quasiquote` with the
1044 second through last element of `ast`.
1045 4. otherwise: return a new list containing: a symbol named "cons", the
1046 result of calling `quasiquote` on first element of `ast`
1047 (`ast[0]`), and result of calling `quasiquote` with the second
1048 through last element of `ast`.
1049
1050
1051Now go to the top level, run the step 7 tests:
1052```
1053make test^quux^step7
1054```
1055
1056Quoting is one of the more mundane functions available in mal, but do
1057not let that discourage you. Your mal implementation is almost
1058complete, and quoting sets the stage for the next very exiting step:
1059macros.
1060
1061
45a8b3ca 1062#### Deferrable
0f4ca9d1
JM
1063
1064* The full names for the quoting forms are fairly verbose. Most Lisp
1065 languages have a short-hand syntax and Mal is no exception. These
1066 short-hand syntaxes are known as reader macros because they allow us
1067 to manipulate mal code during the reader phase. Macros that run
1068 during the eval phase are just called "macros" and are described in
1069 the next section. Expand the conditional with reader `read_form`
1070 function to add the following four cases:
1071 * token is "'" (single quote): return a new list that contains the
1072 symbol "quote" and the result of reading the next form
1073 (`read_form`).
1074 * token is "`" (back-tick): return a new list that contains the
1075 symbol "quasiquote" and the result of reading the next form
1076 (`read_form`).
1077 * token is "~" (tilde): return a new list that contains the
1078 symbol "unquote" and the result of reading the next form
1079 (`read_form`).
1080 * token is "~@" (tilde + at sign): return a new list that contains
1081 the symbol "splice-unquote" and the result of reading the next
1082 form (`read_form`).
1083
1084* Add support for quoting of vectors. The `is_pair` function should
1085 return true if the argument is a non-empty list or vector. `cons`
1086 should also accept a vector as the second argument. The return value
1087 is a list regardless. `concat` should support concatenation of
1088 lists, vectors, or a mix or both. The result is always a list.
1089
1090
1091<a name="step8"></a>
1092
1093### Step 8: Macros
1094
1095![step8_macros architecture](step8_macros.png)
1096
1097Your mal implementation is now ready for one of the most Lispy and
1098exciting of all programming concepts: macros. In the previous step,
1099quoting enabled some simple manipulation data structures and therefore
1100manipulation of mal code (because the `eval` function from step
11016 turns mal data into code). In this step you will be able to mark mal
1102functions as macros which can manipulate mal code before it is
1103evaluated. In other words, macros are user-defined special forms. Or
1104to look at it another way, macros allow mal programs to redefine
1105the mal language itself.
1106
1107Compare the pseudocode for step 7 and step 8 to get a basic idea of
1108the changes that will be made during this step:
1109```
1110diff -urp ../process/step7_quote.txt ../process/step8_macros.txt
1111```
1112
ffd31966
JM
1113* Copy `step7_quote.qx` to `step8_macros.qx`.
1114
1115
0f4ca9d1
JM
1116You might think that the infinite power of macros would require some
1117sort of complex mechanism, but the implementation is actually fairly
1118simple.
1119
1120* Add a new attribute `is_macro` to mal function types. This should
1121 default to false.
1122
1123* Add a new special form `defmacro!`. This is very similar to the
1124 `def!` form, but before the evaluated value (mal function) is set in
1125 the environment, the `is_macro` attribute should be set to true.
1126
1127* Add a `is_macro_call` function: This function takes arguments `ast`
1128 and `env`. It returns true if `ast` is a list that contains a symbol
1129 as the first element and that symbol refers to a function in the
1130 `env` environment and that function has the `is_macro` attribute set
1131 to true. Otherwise, it returns false.
1132
1133* Add a `macroexpand` function: This function takes arguments `ast`
1134 and `env`. It calls `is_macro_call` with `ast` and `env` and loops
1135 while that condition is true. Inside the loop, the first element of
1136 the `ast` list (a symbol), is looked up in the environment to get
1137 the macro function. This macro function is then called/applied with
1138 the rest of the `ast` elements (2nd through the last) as arguments.
1139 The return value of the macro call becomes the new value of `ast`.
1140 When the loop completes because `ast` no longer represents a macro
1141 call, the current value of `ast` is returned.
1142
1143* In the evaluator (`EVAL`) before the special forms switch (apply
1144 section), perform macro expansion by calling the `macroexpand`
1145 function with the current value of `ast` and `env`. Set `ast` to the
1146 result of that call. If the new value of `ast` is no longer a list
1147 after macro expansion, then return `ast`, otherwise continue with
1148 the rest of the apply section (special forms switch).
1149
1150* Add a new special form condition for `macroexpand`. Call the
1151 `macroexpand` function using the first `ast` argument (second list
1152 element) and `env`. Return the result. This special form allows
1153 a mal program to do explicit macro expansion without applying the
1154 result (which can be useful for debugging macro expansion).
1155
1156Now go to the top level, run the step 8 tests:
1157```
1158make test^quux^step8
1159```
1160
8a98ef9a
JM
1161There is a reasonably good chance that the macro tests will not pass
1162the first time. Although the implementation of macros is fairly
1163simple, debugging runtime bugs with macros can be fairly tricky. If
1164you do run into subtle problems that are difficult to solve, let me
1165recommend a couple of approaches:
1166
1167* Use the macroexpand special form to eliminate one of the layers of
1168 indirection (to expand but skip evaluate). This will often reveal
1169 the source of the issue.
1170* Add a debug print statement to the top of your main `eval` function
1171 (inside the TCO loop) to print the current value of `ast` (hint use
1172 `pr_str` to get easier to debug output). Pull up the step8
1173 implementation from another language and uncomment its `eval`
1174 function (yes, I give you permission to violate the rule this once).
1175 Run the two side-by-side. The first difference is likely to point to
1176 the bug.
1177
1178Congratulations! You now have a Lisp interpreter with a super power
1179that most non-Lisp languages can only dream of (I have it on good
1180authority that languages dream when you are not using them). If you
1181are not already familiar with Lisp macros, I suggest the following
1182excercise: write a recursive macro that handles postfixed mal code
1183(with the function as the last parameter instead of the first). Or
1184not. I have not actually done so myself, but I have heard it is an
1185interesting excercise.
1186
1187In the next step you will add try/catch style exception handling to
1188your implementation in addition to some new core functions. After
1189step9 you will be very close to having a fully self-hosting mal
1190implementation. Let us continue!
1191
0f4ca9d1 1192
45a8b3ca 1193### Deferrable
0f4ca9d1
JM
1194
1195* Add the following new core functions which are frequently used in
1196 macro functions:
1197 * `nth`: this function takes a list (or vector) and a number (index)
1198 as arguments, returns the element of the list at the given index.
1199 If the index is out of range, this function raises an exception.
1200 * `first`: this function takes a list (or vector) as its argument
1201 and return the first element. If the list (or vector) is empty or
1202 is `nil` then `nil` is returned.
1203 * `rest`: this function takes a list (or vector) as its argument and
1204 returns a new list containing all the elements except the first.
1205
1206* In the main program, use the `rep` function to define two new
1207 control structures macros. Here are the string arguments for `rep`
1208 to define these macros:
1209 * `cond`: "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))"
1210 * `or`: "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) `(let* (or_FIXME ~(first xs)) (if or_FIXME or_FIXME (or ~@(rest xs))))))))"
1211
1212
ffd31966
JM
1213<a name="step9"></a>
1214
1215### Step 9: Try
1216
1217![step9_try architecture](step9_try.png)
1218
2345a3da
JM
1219In this step you will implement the final mal special form for
1220error/exception handling: `try*/catch*`. You will also add several core
1221functions to you implementation. In particular, you will enhance the
1222functional programming pedigree of you implementation by adding the
1223`apply` and `map` core functions.
1224
ffd31966 1225Compare the pseudocode for step 8 and step 9 to get a basic idea of
dbac60df 1226the changes that will be made during this step:
ffd31966
JM
1227```
1228diff -urp ../process/step8_macros.txt ../process/step9_try.txt
1229```
1230
1231* Copy `step8_macros.qx` to `step9_try.qx`.
1232
2345a3da
JM
1233* Add the `try*/catch*` special form to the EVAL function. The
1234 try catch form looks like this: `(try* A (catch* B C))`. The form
1235 `A` is evaluated, if it throws an exception, then form `C` is
1236 evaluated with a new environment that binds the symbol B to the
1237 value of the exception that was thrown.
1238 * If your target language has built-in try/catch style exception
1239 handling then you are already 90% of the way done. Add a
1240 (native language) try/catch block that calls evaluates `A` within
1241 the try block and catches all exceptions. If an exception is
1242 caught, then translate it to a mal type/value. For native
1243 exceptions this is either the message string or a mal hash-map
1244 that contains the message string and other attributes of the
1245 exception. When a regular mal types/values is used as an
1246 exception, you will probably need to store it within a native
1247 exception type in order to be able to convey/transport it using
1248 the native try/catch mechanism. Then you will extract the mal
1249 type/value from the native exception. Create a new mal environment
1250 that binds B to the value of the exception. Finally, evaluate `C`
1251 using that new environment.
1252 * If your target language does not have built-in try/catch style
1253 exception handling then you have some extra work to do. One of the
1254 most straightforward approaches is to create a a global error
1255 variable that stores the thrown mal type/value. The complication
1256 is that there are a bunch of places where you must check to see if
1257 the global error state is set and return without proceeding. The
1258 rule of thumb is that this check should happen at the top of your
1259 EVAL function and also right after any call to EVAL (and after any
1260 function call that might happen to call EVAL further down the
1261 chain). Yes, it is ugly, but you were warned in the section on
1262 picking a language.
1263
1264* Add the `throw` core function.
1265 * If your language supports try/catch style exception handling, then
1266 this function takes a mal type/value and throws/raises it as an
1267 exception. In order to do this, you may need to create a custom
1268 exception object that wraps a mal value/type.
1269 * If your language does not support try/catch style exception
1270 handling, then set the global error state to the mal type/value.
1271
1272* Add the `apply` and `map` core functions. In step 5, if you did not
1273 add the original function (`fn`) to the structure returned from
bd62ff74
JM
1274 `fn*`, the you will need to do so now.
1275 * `apply`: takes at least two arguments. The first argument is
1276 a function and the last argument is list (or vector). The
1277 arguments between the function and the last arguemnt (if there are
1278 any) are concatenated with the final argument to create the
1279 arguments that are used to call the function. The apply
1280 function allows a function to be called with arguments that are
1281 contained in a list (or vector). In other words, `(apply F A B [C
1282 D])` is equivalent to `(F A B C D)`.
1283 * `map`: takes a function and a list (or vector) and evaluates the
1284 function against every element of the list (or vector) one at
1285 a time and returns the results as a list.
2345a3da
JM
1286
1287* Add some type predicates core functions. In Lisp, predicates are
1288 functions that return true/false (or true value/nil) and typically
bd62ff74
JM
1289 end in "?" or "p".
1290 * `nil?`: takes a single argument and returns true (mal true value)
1291 if the argument is nil (mal nil value).
1292 * `true?`: takes a single argument and returns true (mal true value)
1293 if the argument is a true value (mal true value).
1294 * `false?`: takes a single argument and returns true (mal true
1295 value) if the argument is a false value (mal false value).
1296 * `symbol?`: takes a single argument and returns true (mal true
1297 value) if the argument is a symbol (mal symbol value).
1298
1299Now go to the top level, run the step 9 tests:
1300```
1301make test^quux^step9
1302```
1303
1304Your mal implementation is now essentially a fully featured Lisp
1305interpreter. But if you stop now you will miss one of the most
1306satisfying and enlightening aspects of creating a mal implementation:
1307self-hosting.
2345a3da
JM
1308
1309### Deferrable
1310
e37d9b49
JM
1311* Add the following new core functions:
1312 * `symbol`: takes a string and returns a new symbol with the string
1313 as its name.
1314 * `keyword`: takes a string and returns a keyword with the same name
1315 (usually just be prepending the special keyword
1316 unicode symbol). This function should also detect if the argument
1317 is already a keyword and just return it.
1318 * `keyword?`: takes a single argument and returns true (mal true
1319 value) if the argument is a keyword, otherwise returns false (mal
1320 false value).
1321 * `vector`: takes a variable number of arguments and returns
1322 a vector containing those arguments.
1323 * `vector?`: takes a single argument and returns true (mal true
1324 value) if the argument is a vector, otherwise returns false (mal
1325 false value).
1326 * `hash-map`: takes a variable but even number of arguments and
1327 returns a new mal hash-map value with keys from the odd arguments
1328 and values from the even arguments respectively. This is basically
1329 the functional form of the `{}` reader literal syntax.
1330 * `map?`: takes a single argument and returns true (mal true
1331 value) if the argument is a hash-map, otherwise returns false (mal
1332 false value).
1333 * `assoc`: takes a hash-map as the first argument and the remaining
1334 arguments are odd/even key/value pairs to "associate" (merge) into
1335 the hash-map. Note that the original hash-map is unchanged
1336 (remember, mal values are immutable), and a new hash-map
1337 containing the old hash-maps key/values plus the merged key/value
1338 arguments is returned.
1339 * `dissoc`: takes a hash-map and a list of keys to remove from the
1340 hash-map. Again, note that the original hash-map is unchanged and
1341 a new hash-map with the keys removed is returned. Key arguments
1342 that do not exist in the hash-map are ignored.
1343 * `get`: takes a hash-map and a key and returns the value of looking
1344 up that key in the hash-map. If the key is not found in the
1345 hash-map then nil is returned.
1346 * `contains?`: takes a hash-map and a key and returns true (mal true
1347 value) if the key exists in the hash-map and false (mal false
1348 value) otherwise.
1349 * `keys`: takes a hash-map and returns a list (mal list value) of
1350 all the keys in the hash-map.
1351 * `vals`: takes a hash-map and returns a list (mal list value) of
1352 all the values in the hash-map.
1353 * `sequential?`: takes a single arguments and returns true (mal true
1354 value) if it is a list or a vector, otherwise returns false (mal
1355 false value).
ffd31966
JM
1356
1357
8569b2af 1358<a name="stepA"></a>
ffd31966 1359
bd62ff74 1360### Step A: Mutation, Self-hosting and Interop
ffd31966 1361
90f618cb 1362![stepA_mal architecture](stepA_mal.png)
ffd31966 1363
bd62ff74
JM
1364You have reached the final step of your mal implementation. This step
1365is kind of a catchall for things that did not fit into other steps.
1366But most importantly, the changes you make in this step will unlock
1367the magical power known as "self-hosting". You might have noticed
1368that one of the languages that mal is implemented in is "mal". Any mal
1369implementation that is complete enough can run the mal implementation
1370of mal. You might need to pull out your hammock and ponder this for
1371a while if you have never built a compiler or interpreter before. Look
1372at the step source files for the mal implementation of mal (it is not
1373cheating now that you have reached step A).
1374
1375If you deferred the implementation of keywords, vectors and hash-maps,
1376now is the time to go back and implement them if you want your
1377implementation to self-host.
1378
ffd31966
JM
1379Compare the pseudocode for step 9 and step A to get a basic idea of
1380the changes that will be made during this step:
1381```
dbac60df 1382diff -urp ../process/step9_try.txt ../process/stepA_mal.txt
ffd31966
JM
1383```
1384
90f618cb 1385* Copy `step9_try.qx` to `stepA_mal.qx`.
ffd31966 1386
bd62ff74
JM
1387* Add meta-data support to mal functions. TODO. Should be separate
1388 from the function macro flag.
1389
1390* Add atom data type and supporting core functions. This is the only
1391 mal data type/value that is mutable. TODO
1392
1393* Add the `readline` core function. TODO
1394
1395
1396Now go to the top level, run the step 9 tests:
1397```
1398make test^quux^step9
1399```
1400
1401Once you have passed all the non-optional step A tests, it is time to
1402try self-hosting. Run your step A implementation as normal, but use
1403the file argument mode you added in step 6 to run a each of the step
1404from the mal implementation:
1405```
1406./stepA_mal.qx ../mal/step1_read_print.mal
1407./stepA_mal.qx ../mal/step2_eval.mal
1408...
1409./stepA_mal.qx ../mal/step9_try.mal
1410./stepA_mal.qx ../mal/stepA_mal.mal
1411```
1412
1413There is a very good change that you will encounter an error at some
1414point while trying to run the mal in mal implementation steps above.
1415Debugging failures that happen while self-hosting is MUCH more
1416difficult and mind bending. One of the best approaches I have
1417personally found is to add prn statements to the mal implemenation
1418step (not your own implementation of mal) that is causing problems.
1419
1420Another approach I have frequently used is to pull out the code from
1421the mal implementation that is causing the problem and simplify it
1422step by step until you have a simple piece of mal code that still
1423reproduces the problem. Once the reproducer is simple enough you will
1424probably know where in your own implementation that problem is likely
1425to be. Please add your simple reproducer as a test case so that future
1426implementers will fix similar issues in their code before they get to
1427self-hosting when it is much more difficult to track down and fix.
1428
1429Once you can manually run all the self-hosted steps, it is time to run
1430all the tests in self-hosted mode:
1431```
1432make MAL_IMPL=quux test^mal
1433```
1434
1435When you run into problems (which you almost certainly will), use the
1436same process described above to debug them.
1437
1438Congratulations!!! When all the tests pass, you should pause for
1439a moment and consider what you have accomplished. You have implemented
1440a Lisp interpreter that is powerful and complete enough to run a large
1441mal program which is itself an implementation of the mal language. You
1442might even be asking if you can continue the "inception" by using your
1443implementation to run a mal implementation which itself runs the mal
1444implementation.
1445
1446### Optional
1447
1448* Add metadata support to composite data types, symbols and native
1449 functions. TODO
1450* `time-ms` TODO
1451* `conj` TODO
ffd31966
JM
1452
1453
0f4ca9d1
JM
1454## TODO:
1455
1456* simplify: "X argument (list element Y)" -> ast[Y]
0f4ca9d1
JM
1457* list of types with metadata: list, vector, hash-map, mal functions
1458* more clarity about when to peek and poke in read_list and read_form
1459* tokenizer: use first group rather than whole match (to eliminate
1460 whitespace/commas)