Merge pull request #208 from dubek/vim-fix-time-ms
[jackhill/mal.git] / process / guide.md
CommitLineData
0f4ca9d1
JM
1# The Make-A-Lisp Process
2
3So you want to write a Lisp interpreter? Welcome!
4
5The goal of the Make-A-Lisp project is to make it easy to write your
6own Lisp interpreter without sacrificing those many "Aha!" moments
7that come from ascending the McCarthy mountain. When you reach the peak
8of this particular mountain, you will have an interpreter for the mal
9Lisp language that is powerful enough to be self-hosting, meaning it
10will be able to run a mal interpreter written in mal itself.
11
12So jump right in (er ... start the climb)!
13
dc791440
DM
14- [Pick a language](#pick-a-language)
15- [Getting started](#getting-started)
16- [General hints](#general-hints)
17- [The Make-A-Lisp Process](#the-make-a-lisp-process-1)
18 - [Step 0: The REPL](#step-0-the-repl)
19 - [Step 1: Read and Print](#step-1-read-and-print)
20 - [Step 2: Eval](#step-2-eval)
21 - [Step 3: Environments](#step-3-environments)
22 - [Step 4: If Fn Do](#step-4-if-fn-do)
23 - [Step 5: Tail call optimization](#step-5-tail-call-optimization)
24 - [Step 6: Files, Mutation, and Evil](#step-6-files-mutation-and-evil)
25 - [Step 7: Quoting](#step-7-quoting)
26 - [Step 8: Macros](#step-8-macros)
27 - [Step 9: Try](#step-9-try)
28 - [Step A: Metadata, Self-hosting and Interop](#step-a-metadata-self-hosting-and-interop)
29
30
0f4ca9d1
JM
31## Pick a language
32
33You might already have a language in mind that you want to use.
34Technically speaking, mal can be implemented in any sufficiently
c10dcb94 35complete programming language (i.e. Turing complete), however, there are a few
0f4ca9d1
JM
36language features that can make the task MUCH easier. Here are some of
37them in rough order of importance:
38
39* A sequential compound data structure (e.g. arrays, lists,
40 vectors, etc)
41* An associative compound data structure (e.g. a dictionary,
42 hash-map, associative array, etc)
43* Function references (first class functions, function pointers,
44 etc)
45* Real exception handling (try/catch, raise, throw, etc)
46* Variable argument functions (variadic, var args, splats, apply, etc)
47* Function closures
48* PCRE regular expressions
49
50In addition, the following will make your task especially easy:
51
bf518367
JM
52* Dynamic typing / boxed types (specifically, the ability to store
53 different data types in the sequential and associative structures
54 and the language keeps track of the type for you)
0f4ca9d1
JM
55* Compound data types support arbitrary runtime "hidden" data
56 (metadata, metatables, dynamic fields attributes)
57
58Here are some examples of languages that have all of the above
59features: JavaScript, Ruby, Python, Lua, R, Clojure.
60
cf70df1f
JM
61Michael Fogus has some great blog posts on interesting but less well
62known languages and many of the languages on his lists do not yet have
63any mal implementations:
64* http://blog.fogus.me/2011/08/14/perlis-languages/
65* http://blog.fogus.me/2011/10/18/programming-language-development-the-past-5-years/
66
0f4ca9d1
JM
67Many of the most popular languages already have Mal implementations.
68However, this should not discourage you from creating your own
69implementation in a language that already has one. However, if you go
70this route, I suggest you avoid referring to the existing
71implementations (i.e. "cheating") to maximize your learning experience
72instead of just borrowing mine. On the other hand, if your goal is to
73add new implementations to mal as efficiently as possible, then you
74SHOULD find the most similar target language implementation and refer
75to it frequently.
76
bf518367
JM
77If you want a fairly long list of programming languages with an
78approximate measure of popularity, try the [Programming Language
79Popularity Chart](http://langpop.corger.nl/)
80
0f4ca9d1
JM
81
82## Getting started
83
84* Install your chosen language interpreter/compiler, language package
85 manager and build tools (if applicable)
86
87* Fork the mal repository on github and then clone your forked
88 repository:
89```
90git clone git@github.com:YOUR_NAME/mal.git
91cd mal
92```
93
fc03712f
EC
94* Make a new directory for your implementation. For example, if your
95language is called "quux":
0f4ca9d1
JM
96```
97mkdir quux
98```
99
100* Modify the top level Makefile to allow the tests to be run against
101 your implementation. For example, if your language is named "quux"
102 and uses "qx" as the file extension, then make the following
103 3 modifications to Makefile:
104```
105IMPLS = ... quux ...
106...
107quux_STEP_TO_PROG = mylang/$($(1)).qx
108...
109quux_RUNSTEP = ../$(2) $(3)
110```
111
112This allows you to run tests against your implementation like this:
113```
e5737b08 114make "test^quux^stepX"
0f4ca9d1
JM
115```
116
396d869e
JM
117TODO: If your implementation language is a compiled language, then you
118should also add a Makefile at the top level of your implementation
5c052692
WH
119directory.
120
121Your Makefile will define how to build the files pointed to by the
396d869e
JM
122quux_STEP_TO_PROG macro. The top-level Makefile will attempt to build
123those targets before running tests. If it is a scripting
124language/uncompiled, then no Makefile is necessary because
125quux_STEP_TO_PROG will point to a source file that already exists and
126does not need to be compiled/built.
127
0f4ca9d1
JM
128
129## General hints
130
131Stackoverflow and Google are your best friends. Modern polyglot
132developers do not memorize dozens of programming languages. Instead,
133they learn the peculiar terminology used with each language and then
134use this to search for their answers.
135
136Here are some other resources where multiple languages are
137compared/described:
138* http://learnxinyminutes.com/
139* http://hyperpolyglot.org/
140* http://rosettacode.org/
334a71b6 141* http://rigaux.org/language-study/syntax-across-languages/
0f4ca9d1
JM
142
143Do not let yourself be bogged down by specific problems. While the
144make-a-lisp process is structured as a series of steps, the reality is
145that building a lisp interpreter is more like a branching tree. If you
146get stuck on tail call optimization, or hash-maps, move on to other
147things. You will often have a stroke of inspiration for a problem as
148you work through other functionality. I have tried to structure this
45a8b3ca
JM
149guide and the tests to make clear which things can be deferred until
150later.
151
152An aside on deferrable/optional bits: when you run the tests for
153a given step, the last tests are often marked with an "optional"
154header. This indicates that these are tests for functionality that is
155not critical to finish a basic mal implementation. Many of the steps
156in this process guide have a "Deferrable" section, however, it is not
157quite the same meaning. Those sections include the functionality that
158is marked as optional in the tests, but they also include
0f4ca9d1
JM
159functionality that becomes mandatory at a later step. In other words,
160this is a "make your own Lisp adventure".
161
162Use test driven development. Each step of the make-a-lisp process has
163a bunch of tests associated with it and there is an easy script to run
164all the tests for a specific step in the process. Pick a failing test,
165fix it, repeat until all the tests for that step pass.
166
5c052692
WH
167## Reference Code
168
0f4ca9d1
JM
169The `process` directory contains abbreviated pseudocode and
170architecture images for each step of the make-a-lisp process. Use
171a textual diff/comparison tool to compare the previous pseudocode step
172with the one you are working on. The architecture images have changes
173from the previous step highlighted in red.
174
daa1cf3f
JM
175If you get completely stuck and are feeling like giving up, then you
176should "cheat" by referring to the same step or functionality in
177a existing implementation language. You are here to learn, not to take
178a test, so do not feel bad about it. Okay, you should feel a little
179bit bad about it.
0f4ca9d1
JM
180
181
182## The Make-A-Lisp Process
183
184In the steps that follow the name of the target language is "quux" and
185the file extension for that language is "qx".
186
187
188<a name="step0"></a>
189
190### Step 0: The REPL
191
192![step0_repl architecture](step0_repl.png)
193
194This step is basically just creating a skeleton of your interpreter.
195
196* Create a `step0_repl.qx` file in `quux/`.
197
198* Add the 4 trivial functions `READ`, `EVAL`, `PRINT`, and `rep`
199 (read-eval-print). `READ`, `EVAL`, and `PRINT` are basically just
200 stubs that return their first parameter (a string if your target
201 language is a statically typed) and `rep` calls them in order
202 passing the return to the input of the next.
203
58ba5af4
JM
204* Add a main loop that repeatedly prints a prompt (needs to be
205 "user> " for later tests to pass), gets a line of input from the
206 user, calls `rep` with that line of input, and then prints out the
207 result from `rep`. It should also exit when you send it an EOF
208 (often Ctrl-D).
0f4ca9d1
JM
209
210* If you are using a compiled (ahead-of-time rather than just-in-time)
211 language, then create a Makefile (or appropriate project definition
212 file) in your directory.
213
daa1cf3f
JM
214It is time to run your first tests. This will check that your program
215does input and output in a way that can be captured by the test
216harness. Go to the top level and run the following:
217```
e5737b08 218make "test^quux^step0"
daa1cf3f 219```
0f4ca9d1
JM
220
221Add and then commit your new `step0_repl.qx` and `Makefile` to git.
222
223Congratulations! You have just completed the first step of the
224make-a-lisp process.
225
226
bd62ff74 227#### Optional:
0f4ca9d1
JM
228
229* Add full line editing and command history support to your
230 interpreter REPL. Many languages have a library/module that provide
231 line editing support. Another option if your language supports it is
232 to use an FFI (foreign function interface) to load and call directly
94a954f5 233 into GNU readline, editline, or linenoise library. Add line
0f4ca9d1
JM
234 editing interface code to `readline.qx`
235
236
237<a name="step1"></a>
238
239### Step 1: Read and Print
240
241![step1_read_print architecture](step1_read_print.png)
242
243In this step, your interpreter will "read" the string from the user
244and parse it into an internal tree data structure (an abstract syntax
245tree) and then take that data structure and "print" it back to
246a string.
247
248In non-lisp languages, this step (called "lexing and parsing") can be
249one of the most complicated parts of the compiler/interpreter. In
250Lisp, the data structure that you want in memory is basically
251represented directly in the code that the programmer writes
252(homoiconicity).
253
254For example, if the string is "(+ 2 (* 3 4))" then the read function
255will process this into a tree structure that looks like this:
256```
257 List
258 / | \
259 / | \
260 / | \
261 Sym:+ Int:2 List
262 / | \
263 / | \
264 / | \
265 Sym:* Int:3 Int:4
266```
267
268Each left paren and its matching right paren (lisp "sexpr") becomes
269a node in the tree and everything else becomes a leaf in the tree.
270
271If you can find code for an implementation of a JSON encoder/decoder
272in your target language then you can probably just borrow and modify
273that and be 75% of the way done with this step.
274
275The rest of this section is going to assume that you are not starting
276from an existing JSON encoder/decoder, but that you do have access to
277a Perl compatible regular expressions (PCRE) module/library. You can
278certainly implement the reader using simple string operations, but it
279is more involved. The `make`, `ps` (postscript) and Haskell
280implementations have examples of a reader/parser without using regular
281expression support.
282
283* Copy `step0_repl.qx` to `step1_read_print.qx`.
284
285* Add a `reader.qx` file to hold functions related to the reader.
286
287* If the target language has objects types (OOP), then the next step
288 is to create a simple stateful Reader object in `reader.qx`. This
289 object will store the tokens and a position. The Reader object will
d717d991 290 have two methods: `next` and `peek`. `next` returns the token at
0f4ca9d1
JM
291 the current position and increments the position. `peek` just
292 returns the token at the current position.
293
294* Add a function `read_str` in `reader.qx`. This function
295 will call `tokenizer` and then create a new Reader object instance
296 with the tokens. Then it will call `read_form` with the Reader
297 instance.
298
299* Add a function `tokenizer` in `reader.qx`. This function will take
2315fd53 300 a single string and return an array/list
0f4ca9d1
JM
301 of all the tokens (strings) in it. The following regular expression
302 (PCRE) will match all mal tokens.
303```
304[\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"|;.*|[^\s\[\]{}('"`,;)]*)
305```
cfdf00cc
OR
306* For each match captured within the parenthesis starting at char 6 of the
307 regular expression a new token will be created.
308
309 * `[\s,]*`: Matches any number of whitespaces or commas. This is not captured
310 so it will be ignored and not tokenized.
311
312 * `~@`: Captures the special two-characters `~@` (tokenized).
313
314 * ```[\[\]{}()'`~^@]```: Captures any special single character, one of
315 ```[]{}'`~^@``` (tokenized).
316
317 * `"(?:\\.|[^\\"])*"`: Starts capturing at a double-quote and stops at the
318 next double-quote unless it was proceeded by a backslash in which case it
a85b8454 319 includes it until the next double-quote (tokenized).
cfdf00cc
OR
320
321 * `;.*`: Captures any sequence of characters starting with `;` (tokenized).
322
323 * ```[^\s\[\]{}('"`,;)]*```: Captures a sequence of zero or more non special
324 characters (e.g. symbols, numbers, "true", "false", and "nil") and is sort
325 of the inverse of the one above that captures special characters (tokenized).
0f4ca9d1
JM
326
327* Add the function `read_form` to `reader.qx`. This function
328 will peek at the first token in the Reader object and switch on the
329 first character of that token. If the character is a left paren then
330 `read_list` is called with the Reader object. Otherwise, `read_atom`
331 is called with the Reader Object. The return value from `read_form`
332 is a mal data type. If your target language is statically typed then
333 you will need some way for `read_form` to return a variant or
334 subclass type. For example, if your language is object oriented,
2315fd53 335 then you can define a top level MalType (in `types.qx`) that all
0f4ca9d1
JM
336 your mal data types inherit from. The MalList type (which also
337 inherits from MalType) will contains a list/array of other MalTypes.
338 If your language is dynamically typed then you can likely just
339 return a plain list/array of other mal types.
340
341* Add the function `read_list` to `reader.qx`. This function will
342 repeatedly call `read_form` with the Reader object until it
343 encounters a ')' token (if it reach EOF before reading a ')' then
344 that is an error). It accumulates the results into a List type. If
345 your language does not have a sequential data type that can hold mal
346 type values you may need to implement one (in `types.qx`). Note
347 that `read_list` repeatedly calls `read_form` rather than
a85b8454 348 `read_atom`. This mutually recursive definition between `read_list`
0f4ca9d1
JM
349 and `read_form` is what allows lists to contain lists.
350
351* Add the function `read_atom` to `reader.qx`. This function will
352 look at the contents of the token and return the appropriate scalar
353 (simple/single) data type value. Initially, you can just implement
b84a50cc 354 numbers (integers) and symbols. This will allow you to proceed
0f4ca9d1
JM
355 through the next couple of steps before you will need to implement
356 the other fundamental mal types: nil, true, false, and string. The
357 remaining mal types: keyword, vector, hash-map, and atom do not
358 need to be implemented until step 9 (but can be implemented at any
359 point between this step and that). BTW, symbols types are just an
360 object that contains a single string name value (some languages have
361 symbol types already).
362
363* Add a file `printer.qx`. This file will contain a single function
364 `pr_str` which does the opposite of `read_str`: take a mal data
365 structure and return a string representation of it. But `pr_str` is
366 much simpler and is basically just a switch statement on the type of
367 the input object:
368
369 * symbol: return the string name of the symbol
370 * number: return the number as a string
371 * list: iterate through each element of the list calling `pr_str` on
372 it, then join the results with a space separator, and surround the
373 final result with parens
374
375* Change the `READ` function in `step1_read_print.qx` to call
376 `reader.read_str` and the `PRINT` function to call `printer.pr_str`.
377 `EVAL` continues to simply return its input but the type is now
378 a mal data type.
379
380You now have enough hooked up to begin testing your code. You can
381manually try some simple inputs:
382 * `123` -> `123`
383 * ` 123 ` -> `123`
384 * `abc` -> `abc`
385 * ` abc ` -> `abc`
386 * `(123 456)` -> `(123 456)`
387 * `( 123 456 789 ) ` -> `(123 456 789)`
6a767b0d 388 * `( + 2 (* 3 4) ) ` -> `(+ 2 (* 3 4))`
0f4ca9d1
JM
389
390To verify that your code is doing more than just eliminating extra
391spaces (and not failing), you can instrument your `reader.qx` functions.
392
393Once you have gotten past those simple manual tests, it is time to run
394the full suite of step 1 tests. Go to the top level and run the
395following:
396```
e5737b08 397make "test^quux^step1"
0f4ca9d1
JM
398```
399
400Fix any test failures related to symbols, numbers and lists.
401
402Depending on the functionality of your target language, it is likely
403that you have now just completed one of the most difficult steps. It
404is down hill from here. The remaining steps will probably be easier
405and each step will give progressively more bang for the buck.
406
45a8b3ca 407#### Deferrable:
0f4ca9d1
JM
408
409
410* Add error checking to your reader functions to make sure parens
411 are properly matched. Catch and print these errors in your main
412 loop. If your language does not have try/catch style bubble up
413 exception handling, then you will need to add explicit error
414 handling to your code to catch and pass on errors without crashing.
415
416* Add support for the other basic data type to your reader and printer
417 functions: string, nil, true, and false. These become mandatory at
8d78bc26
JM
418 step 4. When a string is read, the following transformations are
419 applied: a backslash followed by a doublequote is translated into
420 a plain doublequote character, a backslash followed by "n" is
421 translated into a newline, and a backslash followed by another
422 backslash is translated into a single backslash. To properly print
423 a string (for step 4 string functions), the `pr_str` function needs
424 another parameter called `print_readably`. When `print_readably` is
425 true, doublequotes, newlines, and backslashes are translated into
426 their printed representations (the reverse of the reader). The
427 `PRINT` function in the main program should call `pr_str` with
428 print_readably set to true.
0f4ca9d1 429
1c2eab90 430* Add support for the other mal types: keyword, vector, hash-map.
2345a3da
JM
431 * keyword: a keyword is a token that begins with a colon. A keyword
432 can just be stored as a string with special unicode prefix like
433 0x29E (or char 0xff/127 if the target language does not have good
434 unicode support) and the printer translates strings with that
435 prefix back to the keyword representation. This makes it easy to
436 use keywords as hash map keys in most languages. You can also
437 store keywords as a unique data type, but you will need to make
438 sure they can be used as hash map keys (which may involve doing
439 a similar prefixed translation anyways).
440 * vector: a vector can be implemented with same underlying
441 type as a list as long as there is some mechanism to keep track of
442 the difference. You can use the same reader function for both
443 lists and vectors by adding parameters for the starting and ending
444 tokens.
445 * hash-map: a hash-map is an associative data structure that maps
446 strings to other mal values. If you implement keywords as prefixed
447 strings, then you only need a native associative data structure
448 which supports string keys. Clojure allows any value to be a hash
449 map key, but the base functionality in mal is to support strings
450 and keyword keys. Because of the representation of hash-maps as
451 an alternating sequence of keys and values, you can probably use
452 the same reader function for hash-maps as lists and vectors with
453 parameters to indicate the starting and ending tokens. The odd
454 tokens are then used for keys with the corresponding even tokens
455 as the values.
456
457* Add support for reader macros which are forms that are
458 transformed into other forms during the read phase. Refer to
459 `tests/step1_read_print.mal` for the form that these macros should
460 take (they are just simple transformations of the token stream).
0f4ca9d1
JM
461
462* Add comment support to your reader. The tokenizer should ignore
463 tokens that start with ";". Your `read_str` function will need to
464 properly handle when the tokenizer returns no values. The simplest
465 way to do this is to return `nil` mal value. A cleaner option (that
466 does not print `nil` at the prompt is to throw a special exception
467 that causes the main loop to simply continue at the beginning of the
468 loop without calling `rep`.
469
470
471<a name="step2"></a>
472
473### Step 2: Eval
474
475![step2_eval architecture](step2_eval.png)
476
477In step 1 your mal interpreter was basically just a way to validate
478input and eliminate extraneous white space. In this step you will turn
479your interpreter into a simple number calculator by adding
480functionality to the evaluator (`EVAL`).
481
482Compare the pseudocode for step 1 and step 2 to get a basic idea of
483the changes that will be made during this step:
484```
485diff -urp ../process/step1_read_print.txt ../process/step2_eval.txt
486```
487
488* Copy `step1_read_print.qx` to `step2_eval.qx`.
489
490* Define a simple initial REPL environment. This environment is an
491 associative structure that maps symbols (or symbol names) to
492 numeric functions. For example, in python this would look something
493 like this:
494```
495repl_env = {'+': lambda a,b: a+b,
496 '-': lambda a,b: a-b,
497 '*': lambda a,b: a*b,
498 '/': lambda a,b: int(a/b)}
499```
500
501* Modify the `rep` function to pass the REPL environment as the second
502 parameter for the `EVAL` call.
503
504* Create a new function `eval_ast` which takes `ast` (mal data type)
505 and an associative structure (the environment from above).
506 `eval_ast` switches on the type of `ast` as follows:
507
508 * symbol: lookup the symbol in the environment structure and return
509 the value or raise an error no value is found
510 * list: return a new list that is the result of calling `EVAL` on
511 each of the members of the list
512 * otherwise just return the original `ast` value
513
514* Modify `EVAL` to check if the first parameter `ast` is a list.
515 * `ast` is not a list: then return the result of calling `eval_ast`
516 on it.
efa2daef 517 * `ast` is a empty list: return ast unchanged.
0f4ca9d1
JM
518 * `ast` is a list: call `eval_ast` to get a new evaluated list. Take
519 the first item of the evaluated list and call it as function using
520 the rest of the evaluated list as its arguments.
521
522If your target language does not have full variable length argument
523support (e.g. variadic, vararg, splats, apply) then you will need to
524pass the full list of arguments as a single parameter and split apart
525the individual values inside of every mal function. This is annoying,
526but workable.
527
528The process of taking a list and invoking or executing it to return
529something new is known in Lisp as the "apply" phase.
530
531Try some simple expressions:
532
533 * `(+ 2 3)` -> `5`
534 * `(+ 2 (* 3 4))` -> `14`
535
536The most likely challenge you will encounter is how to properly call
537a function references using an arguments list.
538
539Now go to the top level, run the step 2 tests and fix the errors.
540```
e5737b08 541make "test^quux^step2"
0f4ca9d1
JM
542```
543
544You now have a simple prefix notation calculator!
545
0748211b
DM
546#### Deferrable:
547
548* `eval_ast` should evaluate elements of vectors and hash-maps. Add the
549 following cases in `eval_ast`:
550 * If `ast` is a vector: return a new vector that is the result of calling
551 `EVAL` on each of the members of the vector.
552 * If `ast` is a hash-map: return a new hash-map which consists of key-value
553 pairs where the key is a key from the hash-map and the value is the result
554 of calling `EVAL` on the corresponding value.
555
0f4ca9d1
JM
556
557<a name="step3"></a>
558
559### Step 3: Environments
560
561![step3_env architecture](step3_env.png)
562
563In step 2 you were already introduced to REPL environment (`repl_env`)
564where the basic numeric functions were stored and looked up. In this
565step you will add the ability to create new environments (`let*`) and
daa1cf3f 566modify existing environments (`def!`).
0f4ca9d1
JM
567
568A Lisp environment is an associative data structure that maps symbols (the
569keys) to values. But Lisp environments have an additional important
570function: they can refer to another environment (the outer
571environment). During environment lookups, if the current environment
572does not have the symbol, the lookup continues in the outer
573environment, and continues this way until the symbol is either found,
574or the outer environment is `nil` (the outermost environment in the
575chain).
576
577Compare the pseudocode for step 2 and step 3 to get a basic idea of
578the changes that will be made during this step:
579```
580diff -urp ../process/step2_eval.txt ../process/step3_env.txt
581```
582
d9c020b0 583* Copy `step2_eval.qx` to `step3_env.qx`.
0f4ca9d1
JM
584
585* Create `env.qx` to hold the environment definition.
586
587* Define an `Env` object that is instantiated with a single `outer`
588 parameter and starts with an empty associative data structure
589 property `data`.
590
591* Define three methods for the Env object:
592 * set: takes a symbol key and a mal value and adds to the `data`
593 structure
594 * find: takes a symbol key and if the current environment contains
595 that key then return the environment. If no key is found and outer
596 is not `nil` then call find (recurse) on the outer environment.
597 * get: takes a symbol key and uses the `find` method to locate the
598 environment with the key, then returns the matching value. If no
599 key is found up the outer chain, then throws/raises a "not found"
600 error.
601
169ddeb2 602* Update `step3_env.qx` to use the new `Env` type to create the
0f4ca9d1
JM
603 repl_env (with a `nil` outer value) and use the `set` method to add
604 the numeric functions.
605
606* Modify `eval_ast` to call the `get` method on the `env` parameter.
607
608* Modify the apply section of `EVAL` to switch on the first element of
609 the list:
610 * symbol "def!": call the set method of the current environment
611 (second parameter of `EVAL` called `env`) using the unevaluated
612 first parameter (second list element) as the symbol key and the
613 evaluated second parameter as the value.
a727d6e8 614 * symbol "let\*": create a new environment using the current
0f4ca9d1 615 environment as the outer value and then use the first parameter as
a727d6e8
MK
616 a list of new bindings in the "let\*" environment. Take the second
617 element of the binding list, call `EVAL` using the new "let\*"
0f4ca9d1 618 environment as the evaluation environment, then call `set` on the
a727d6e8 619 "let\*" environment using the first binding list element as the key
0f4ca9d1
JM
620 and the evaluated second element as the value. This is repeated
621 for each odd/even pair in the binding list. Note in particular,
622 the bindings earlier in the list can be referred to by later
623 bindings. Finally, the second parameter (third element) of the
a727d6e8 624 original `let*` form is evaluated using the new "let\*" environment
0f4ca9d1
JM
625 and the result is returned as the result of the `let*` (the new
626 let environment is discarded upon completion).
627 * otherwise: call `eval_ast` on the list and apply the first element
628 to the rest as before.
629
630`def!` and `let*` are Lisp "specials" (or "special atoms") which means
631that they are language level features and more specifically that the
632rest of the list elements (arguments) may be evaluated differently (or
633not at all) unlike the default apply case where all elements of the
634list are evaluated before the first element is invoked. Lists which
635contain a "special" as the first element are known as "special forms".
34935988 636They are special because they follow special evaluation rules.
0f4ca9d1
JM
637
638Try some simple environment tests:
639
640 * `(def! a 6)` -> `6`
641 * `a` -> `6`
642 * `(def! b (+ a 2))` -> `8`
643 * `(+ a b)` -> `14`
644 * `(let* (c 2) c)` -> `2`
645
646Now go to the top level, run the step 3 tests and fix the errors.
647```
e5737b08 648make "test^quux^step3"
0f4ca9d1
JM
649```
650
651You mal implementation is still basically just a numeric calculator
652with save/restore capability. But you have set the foundation for step
6534 where it will begin to feel like a real programming language.
654
655
656An aside on mutation and typing:
657
658The "!" suffix on symbols is used to indicate that this symbol refers
659to a function that mutates something else. In this case, the `def!`
660symbol indicates a special form that will mutate the current
661environment. Many (maybe even most) of runtime problems that are
662encountered in software engineering are a result of mutation. By
663clearly marking code where mutation may occur, you can more easily
664track down the likely cause of runtime problems when they do occur.
665
666Another cause of runtime errors is type errors, where a value of one
667type is unexpectedly treated by the program as a different and
668incompatible type. Statically typed languages try to make the
669programmer solve all type problems before the program is allowed to
670run. Most Lisp variants tend to be dynamically typed (types of values
671are checked when they are actually used at runtime).
672
673As an aside-aside: The great debate between static and dynamic typing
4e3fd73b
MK
674can be understood by following the money. Advocates of strict static
675typing use words like "correctness" and "safety" and thus get
0f4ca9d1
JM
676government and academic funding. Advocates of dynamic typing use words
677like "agile" and "time-to-market" and thus get venture capital and
678commercial funding.
679
680
681<a name="step4"></a>
682
683### Step 4: If Fn Do
684
685![step4_if_fn_do architecture](step4_if_fn_do.png)
686
687In step 3 you added environments and the special forms for
688manipulating environments. In this step you will add 3 new special
689forms (`if`, `fn*` and `do`) and add several more core functions to
690the default REPL environment. Our new architecture will look like
691this:
692
693The `fn*` special form is how new user-defined functions are created.
694In some Lisps, this special form is named "lambda".
695
696Compare the pseudocode for step 3 and step 4 to get a basic idea of
697the changes that will be made during this step:
698```
699diff -urp ../process/step3_env.txt ../process/step4_if_fn_do.txt
700```
701
702* Copy `step3_env.qx` to `step4_if_fn_do.qx`.
703
704* If you have not implemented reader and printer support (and data
705 types) for `nil`, `true` and `false`, you will need to do so for
706 this step.
707
708* Update the constructor/initializer for environments to take two new
709 arguments: `binds` and `exprs`. Bind (`set`) each element (symbol)
710 of the binds list to the respective element of the `exprs` list.
711
712* Add support to `printer.qx` to print functions values. A string
713 literal like "#<function>" is sufficient.
714
1af1f7f1 715* Add the following special forms to `EVAL`:
0f4ca9d1 716
1af1f7f1 717 * `do`: Evaluate all the elements of the list using `eval_ast`
f558a0c8 718 and return the final evaluated element.
0f4ca9d1
JM
719 * `if`: Evaluate the first parameter (second element). If the result
720 (condition) is anything other than `nil` or `false`, then evaluate
1af1f7f1 721 the second parameter (third element of the list) and return the
0f4ca9d1
JM
722 result. Otherwise, evaluate the third parameter (fourth element)
723 and return the result. If condition is false and there is no third
724 parameter, then just return `nil`.
725 * `fn*`: Return a new function closure. The body of that closure
726 does the following:
727 * Create a new environment using `env` (closed over from outer
728 scope) as the `outer` parameter, the first parameter (second
729 list element of `ast` from the outer scope) as the `binds`
730 parameter, and the parameters to the closure as the `exprs`
731 parameter.
732 * Call `EVAL` on the second parameter (third list element of `ast`
733 from outer scope), using the new environment. Use the result as
734 the return value of the closure.
735
736If your target language does not support closures, then you will need
737to implement `fn*` using some sort of structure or object that stores
738the values being closed over: the first and second elements of the
739`ast` list (function parameter list and function body) and the current
740environment `env`. In this case, your native functions will need to be
741wrapped in the same way. You will probably also need a method/function
742that invokes your function object/structure for the default case of
743the apply section of `EVAL`.
744
745Try out the basic functionality you have implemented:
746
747 * `(fn* [a] a)` -> `#<function>`
748 * `( (fn* [a] a) 7)` -> `7`
749 * `( (fn* [a] (+ a 1)) 10)` -> `11`
750 * `( (fn* [a b] (+ a b)) 2 3)` -> `5`
751
752* Add a new file `core.qx` and define an associative data structure
753 `ns` (namespace) that maps symbols to functions. Move the numeric
754 function definitions into this structure.
755
756* Modify `step4_if_fn_do.qx` to iterate through the `core.ns`
757 structure and add (`set`) each symbol/function mapping to the
758 REPL environment (`repl_env`).
759
760* Add the following functions to `core.ns`:
761 * `list`: take the parameters and return them as a list.
762 * `list?`: return true if the first parameter is a list, false
763 otherwise.
764 * `empty?`: treat the first parameter as a list and return true if
765 the list is empty and false if it contains any elements.
766 * `count`: treat the first parameter as a list and return the number
767 of elements that it contains.
768 * `=`: compare the first two parameters and return true if they are
769 the same type and contain the same value. In the case of equal
770 length lists, each element of the list should be compared for
771 equality and if they are the same return true, otherwise false.
772 * `<`, `<=`, `>`, and `>=`: treat the first two parameters as
773 numbers and do the corresponding numeric comparison, returning
774 either true or false.
775
776Now go to the top level, run the step 4 tests. There are a lot of
777tests in step 4 but all of the non-optional tests that do not involve
778strings should be able to pass now.
779
780```
e5737b08 781make "test^quux^step4"
0f4ca9d1
JM
782```
783
784Your mal implementation is already beginning to look like a real
785language. You have flow control, conditionals, user-defined functions
786with lexical scope, side-effects (if you implement the string
1380d8c1
JM
787functions), etc. However, our little interpreter has not quite reached
788Lisp-ness yet. The next several steps will take your implementation
789from a neat toy to a full featured language.
0f4ca9d1 790
45a8b3ca 791#### Deferrable:
0f4ca9d1
JM
792
793* Implement Clojure-style variadic function parameters. Modify the
794 constructor/initializer for environments, so that if a "&" symbol is
795 encountered in the `binds` list, the next symbol in the `binds` list
796 after the "&" is bound to the rest of the `exprs` list that has not
797 been bound yet.
798
1c2eab90 799* Define a `not` function using mal itself. In `step4_if_fn_do.qx`
0f4ca9d1
JM
800 call the `rep` function with this string:
801 "(def! not (fn* (a) (if a false true)))".
802
803* Implement the strings functions in `core.qx`. To implement these
804 functions, you will need to implement the string support in the
45a8b3ca 805 reader and printer (deferrable section of step 1). Each of the string
0f4ca9d1
JM
806 functions takes multiple mal values, prints them (`pr_str`) and
807 joins them together into a new string.
808 * `pr-str`: calls `pr_str` on each argument with `print_readably`
809 set to true, joins the results with " " and returns the new
810 string.
811 * `str`: calls `pr_str` on each argument with `print_readably` set
812 to false, concatenates the results together ("" separator), and
813 returns the new string.
814 * `prn`: calls `pr_str` on each argument with `print_readably` set
815 to true, joins the results with " ", prints the string to the
816 screen and then returns `nil`.
817 * `println`: calls `pr_str` on each argument with `print_readably` set
818 to false, joins the results with " ", prints the string to the
819 screen and then returns `nil`.
820
821
822<a name="step5"></a>
823
824### Step 5: Tail call optimization
825
826![step5_tco architecture](step5_tco.png)
827
828In step 4 you added special forms `do`, `if` and `fn*` and you defined
829some core functions. In this step you will add a Lisp feature called
830tail call optimization (TCO). Also called "tail recursion" or
831sometimes just "tail calls".
832
833Several of the special forms that you have defined in `EVAL` end up
834calling back into `EVAL`. For those forms that call `EVAL` as the last
835thing that they do before returning (tail call) you will just loop back
836to the beginning of eval rather than calling it again. The advantage
837of this approach is that it avoids adding more frames to the call
1c2eab90
JK
838stack. This is especially important in Lisp languages because they tend
839to prefer using recursion instead of iteration for control structures.
840(Though some Lisps, such as Common Lisp, have iteration.) However, with
841tail call optimization, recursion can be made as stack efficient as
842iteration.
0f4ca9d1
JM
843
844Compare the pseudocode for step 4 and step 5 to get a basic idea of
845the changes that will be made during this step:
846```
847diff -urp ../process/step4_if_fn_do.txt ../process/step5_tco.txt
848```
849
0303d604 850* Copy `step4_if_fn_do.qx` to `step5_tco.qx`.
0f4ca9d1
JM
851
852* Add a loop (e.g. while true) around all code in `EVAL`.
853
854* Modify each of the following form cases to add tail call recursion
855 support:
856 * `let*`: remove the final `EVAL` call on the second `ast` argument
857 (third list element). Set `env` (i.e. the local variable passed in
858 as second parameter of `EVAL`) to the new let environment. Set
859 `ast` (i.e. the local variable passed in as first parameter of
860 `EVAL`) to be the second `ast` argument. Continue at the beginning
861 of the loop (no return).
862 * `do`: change the `eval_ast` call to evaluate all the parameters
4e7296f9
JM
863 except for the last (2nd list element up to but not including
864 last). Set `ast` to the last element of `ast`. Continue
0f4ca9d1
JM
865 at the beginning of the loop (`env` stays unchanged).
866 * `if`: the condition continues to be evaluated, however, rather
867 than evaluating the true or false branch, `ast` is set to the
868 unevaluated value of the chosen branch. Continue at the beginning
869 of the loop (`env` is unchanged).
870
871* The return value from the `fn*` special form will now become an
872 object/structure with attributes that allow the default invoke case
873 of `EVAL` to do TCO on mal functions. Those attributes are:
0f4ca9d1
JM
874 * `ast`: the second `ast` argument (third list element) representing
875 the body of the function.
876 * `params`: the first `ast` argument (second list element)
877 representing the parameter names of the function.
878 * `env`: the current value of the `env` parameter of `EVAL`.
4e7296f9
JM
879 * `fn`: the original function value (i.e. what was return by `fn*`
880 in step 4). Note that this is deferrable until step 9 when it is
881 needed for the `map` and `apply` core functions).
0f4ca9d1
JM
882
883* The default "apply"/invoke case of `EVAL` must now be changed to
884 account for the new object/structure returned by the `fn*` form.
885 Continue to call `eval_ast` on `ast`. The first element is `f`.
886 Switch on the type of `f`:
887 * regular function (not one defined by `fn*`): apply/invoke it as
4e7296f9 888 before (in step 4).
0f4ca9d1
JM
889 * a `fn*` value: set `ast` to the `ast` attribute of `f`. Generate
890 a new environment using the `env` and `params` attributes of `f`
891 as the `outer` and `binds` arguments and rest `ast` arguments
892 (list elements 2 through the end) as the `exprs` argument. Set
893 `env` to the new environment. Continue at the beginning of the loop.
894
895Run some manual tests from previous steps to make sure you have not
896broken anything by adding TCO.
897
898Now go to the top level, run the step 5 tests.
899
900```
e5737b08 901make "test^quux^step5"
0f4ca9d1
JM
902```
903
904Look at the step 5 test file `tests/step5_tco.mal`. The `sum-to`
905function cannot be tail call optimized because it does something after
906the recursive call (`sum-to` calls itself and then does the addition).
907Lispers say that the `sum-to` is not in tail position. The `sum2`
908function however, calls itself from tail position. In other words, the
909recursive call to `sum2` is the last action that `sum2` does. Calling
910`sum-to` with a large value will cause a stack overflow exception in
911most target languages (some have super-special tricks they use to
912avoid stack overflows).
913
914Congratulations, your mal implementation already has a feature (TCO)
915that most mainstream languages lack.
916
917
918<a name="step6"></a>
919
396d869e 920### Step 6: Files, Mutation, and Evil
0f4ca9d1
JM
921
922![step6_file architecture](step6_file.png)
923
924In step 5 you added tail call optimization. In this step you will add
925some string and file operations and give your implementation a touch
926of evil ... er, eval. And as long as your language supports function
927closures, this step will be quite simple. However, to complete this
928step, you must implement string type support, so if you have been
929holding off on that you will need to go back and do so.
930
931Compare the pseudocode for step 5 and step 6 to get a basic idea of
932the changes that will be made during this step:
933```
934diff -urp ../process/step5_tco.txt ../process/step6_file.txt
935```
936
ffd31966
JM
937* Copy `step5_tco.qx` to `step6_file.qx`.
938
0f4ca9d1
JM
939* Add two new string functions to the core namespaces:
940 * `read-string`: this function just exposes the `read_str` function
941 from the reader. If your mal string type is not the same as your
942 target language (e.g. statically typed language) then your
943 `read-string` function will need to unbox (extract) the raw string
944 from the mal string type in order to call `read_str`.
945 * `slurp`: this function takes a file name (string) and returns the
946 contents of the file as a string. Once again, if your mal string
947 type wraps a raw target language string, then you will need to
948 unmarshall (extract) the string parameter to get the raw file name
949 string and marshall (wrap) the result back to a mal string type.
950
6fef8e58
JM
951* In your main program, add a new symbol "eval" to your REPL
952 environment. The value of this new entry is a function that takes
953 a single argument `ast`. The closure calls the your `EVAL` function
954 using the `ast` as the first argument and the REPL environment
955 (closed over from outside) as the second argument. The result of
956 the `EVAL` call is returned. This simple but powerful addition
957 allows your program to treat mal data as a mal program. For example,
cfdf00cc 958 you can now to this:
6fef8e58
JM
959```
960(def! mal-prog (list + 1 2))
961(eval mal-prog)
962```
0f4ca9d1
JM
963
964* Define a `load-file` function using mal itself. In your main
965 program call the `rep` function with this string:
966 "(def! load-file (fn* (f) (eval (read-string (str \"(do \" (slurp f) \")\")))))".
967
968Try out `load-file`:
969 * `(load-file "../tests/incA.mal")` -> `9`
970 * `(inc4 3)` -> `7`
971
972The `load-file` function does the following:
973 * Call `slurp` to read in a file by name. Surround the contents with
974 "(do ...)" so that the whole file will be treated as a single
975 program AST (abstract syntax tree).
976 * Call `read-string` on the string returned from `slurp`. This uses
977 the reader to read/convert the file contents into mal data/AST.
978 * Call `eval` (the one in the REPL environment) on the AST returned
979 from `read-string` to "run" it.
980
627bd6f7
DM
981Besides adding file and eval support, we'll add support for the atom data type
982in this step. An atom is the Mal way to represent *state*; it is
983heavily inspired by [Clojure's atoms](http://clojure.org/state). An atom holds
984a reference to a single Mal value of any type; it supports reading that Mal value
985and *modifying* the reference to point to another Mal value. Note that this is
986the only Mal data type that is mutable (but the Mal values it refers to are
987still immutable; immutability is explained in greater detail in step 7).
a85b8454 988You'll need to add 5 functions to the core namespace to support atoms:
627bd6f7
DM
989
990 * `atom`: Takes a Mal value and returns a new atom which points to that Mal value.
991 * `atom?`: Takes an argument and returns `true` if the argument is an atom.
992 * `deref`: Takes an atom argument and returns the Mal value referenced by this atom.
993 * `reset!`: Takes an atom and a Mal value; the atom is modified to refer to
994 the given Mal value. The Mal value is returned.
995 * `swap!`: Takes an atom, a function, and zero or more function arguments. The
1c2eab90 996 atom's value is modified to the result of applying the function with the atom's
627bd6f7
DM
997 value as the first argument and the optionally given function arguments as
998 the rest of the arguments. The new atom's value is returned. (Side note: Mal is
999 single-threaded, but in concurrent languages like Clojure, `swap!` promises
1000 atomic update: `(swap! myatom (fn* [x] (+ 1 x)))` will always increase the
1001 `myatom` counter by one and will not suffer from missing updates when the
1002 atom is updated from multiple threads.)
1003
1004Optionally, you can add a reader macro `@` which will serve as a short form for
1005`deref`, so that `@a` is equivalent to `(deref a)`. In order to do that, modify
1006the conditional in reader `read_form` function and add a case which deals with
1007the `@` token: if the token is `@` (at sign) then return a new list that
1008contains the symbol `deref` and the result of reading the next form
1009(`read_form`).
1010
0f4ca9d1 1011Now go to the top level, run the step 6 tests. The optional tests will
627bd6f7
DM
1012need support from the reader for comments, vectors, hash-maps and the `@`
1013reader macro:
0f4ca9d1 1014```
e5737b08 1015make "test^quux^step6"
0f4ca9d1
JM
1016```
1017
1018Congratulations, you now have a full-fledged scripting language that
6fef8e58
JM
1019can run other mal programs. The `slurp` function loads a file as
1020a string, the `read-string` function calls the mal reader to turn that
10a76646 1021string into data, and the `eval` function takes data and evaluates it
6fef8e58
JM
1022as a normal mal program. However, it is important to note that the
1023`eval` function is not just for running external programs. Because mal
1024programs are regular mal data structures, you can dynamically generate
1025or manipulate those data structures before calling `eval` on them.
a85b8454 1026This isomorphism (same shape) between data and programs is known as
6fef8e58
JM
1027"homoiconicity". Lisp languages are homoiconic and this property
1028distinguishes them from most other programming languages.
1029
1030You mal implementation is quite powerful already but the set of
1031functions that are available (from `core.qx`) is fairly limited. The
1032bulk of the functions you will add are described in step 9 and step A,
1033but you will begin to flesh them out over the next few steps to
1034support quoting (step 7) and macros (step 8).
0f4ca9d1
JM
1035
1036
45a8b3ca 1037#### Deferrable:
0f4ca9d1
JM
1038
1039* Add the ability to run another mal program from the command line.
1040 Prior to the REPL loop, check if your mal implementation is called
1041 with command line arguments. If so, treat the first argument as
1042 a filename and use `rep` to call `load-file` on that filename, and
1043 finally exit/terminate execution.
1044
1045* Add the rest of the command line arguments to your REPL environment
1046 so that programs that are run with `load-file` have access to their
a85b8454 1047 calling environment. Add a new "\*ARGV\*" (symbol) entry to your REPL
0f4ca9d1
JM
1048 environment. The value of this entry should be the rest of the
1049 command line arguments as a mal list value.
1050
1051
1052<a name="step7"></a>
1053
1054### Step 7: Quoting
1055
1056![step7_quote architecture](step7_quote.png)
1057
1058In step 7 you will add the special forms `quote` and `quasiquote` and
1059add supporting core functions `cons` and `concat`. The two quote forms
1060add a powerful abstraction for manipulating mal code itself
1061(meta-programming).
1062
0f4ca9d1
JM
1063The `quote` special form indicates to the evaluator (`EVAL`) that the
1064parameter should not be evaluated (yet). At first glance, this might
10a76646 1065not seem particularly useful but an example of what this enables is the
0f4ca9d1
JM
1066ability for a mal program to refer to a symbol itself rather than the
1067value that it evaluates to. Likewise with lists. For example, consider
1068the following:
1069
1070* `(prn abc)`: this will lookup the symbol `abc` in the current
1071 evaluation environment and print it. This will result in error if
1072 `abc` is not defined.
1073* `(prn (quote abc))`: this will print "abc" (prints the symbol
1074 itself). This will work regardless of whether `abc` is defined in
1075 the current environment.
1076* `(prn (1 2 3))`: this will result in an error because `1` is not
1077 a function and cannot be applied to the arguments `(2 3)`.
1078* `(prn (quote (1 2 3)))`: this will print "(1 2 3)".
1079* `(def! l (quote (1 2 3)))`: list quoting allows us to define lists
1080 directly in the code (list literal). Another way of doing this is
1081 with the list function: `(def! l (list 1 2 3))`.
1082
1083The second special quoting form is `quasiquote`. This allows a quoted
1084list to have internal elements of the list that are temporarily
1085unquoted (normal evaluation). There are two special forms that only
1086mean something within a quasiquoted list: `unquote` and
1087`splice-unquote`. These are perhaps best explained with some examples:
1088
1089* `(def! lst (quote (2 3)))` -> `(2 3)`
1090* `(quasiquote (1 (unquote lst)))` -> `(1 (2 3))`
1091* `(quasiquote (1 (splice-unquote lst)))` -> `(1 2 3)`
1092
1093The `unquote` form turns evaluation back on for its argument and the
1094result of evaluation is put in place into the quasiquoted list. The
1095`splice-unquote` also turns evaluation back on for its argument, but
1096the evaluated value must be a list which is then "spliced" into the
1097quasiquoted list. The true power of the quasiquote form will be
1c2eab90 1098manifest when it is used together with macros (in the next step).
0f4ca9d1 1099
ffd31966
JM
1100Compare the pseudocode for step 6 and step 7 to get a basic idea of
1101the changes that will be made during this step:
1102```
1103diff -urp ../process/step6_file.txt ../process/step7_quote.txt
1104```
1105
1106* Copy `step6_file.qx` to `step7_quote.qx`.
1107
0f4ca9d1
JM
1108* Before implementing the quoting forms, you will need to implement
1109* some supporting functions in the core namespace:
1110 * `cons`: this function takes a list as its second
1111 parameter and returns a new list that has the first argument
1112 prepended to it.
1113 * `concat`: this functions takes 0 or more lists as
1114 parameters and returns a new list that is a concatenation of all
1115 the list parameters.
1116
1117An aside on immutability: note that neither cons or concat mutate
1118their original list arguments. Any references to them (i.e. other
1119lists that they may be "contained" in) will still refer to the
1120original unchanged value. Mal, like Clojure, is a language which uses
1121immutable data structures. I encourage you to read about the power and
1122importance of immutability as implemented in Clojure (from which
1123Mal borrows most of its syntax and feature-set).
1124
1125* Add the `quote` special form. This form just returns its argument
1126 (the second list element of `ast`).
1127
1128* Add the `quasiquote` special form. First implement a helper function
1129 `is_pair` that returns true if the parameter is a non-empty list.
1130 Then define a `quasiquote` function. This is called from `EVAL` with
1131 the first `ast` argument (second list element) and then `ast` is set
1132 to the result and execution continues at the top of the loop (TCO).
1133 The `quasiquote` function takes a parameter `ast` and has the
1134 following conditional:
1135 1. if `is_pair` of `ast` is false: return a new list containing:
1136 a symbol named "quote" and `ast`.
1137 2. else if the first element of `ast` is a symbol named "unquote":
1138 return the second element of `ast`.
1c2eab90 1139 3. if `is_pair` of the first element of `ast` is true and the first
0f4ca9d1
JM
1140 element of first element of `ast` (`ast[0][0]`) is a symbol named
1141 "splice-unquote": return a new list containing: a symbol named
1142 "concat", the second element of first element of `ast`
1143 (`ast[0][1]`), and the result of calling `quasiquote` with the
1144 second through last element of `ast`.
1145 4. otherwise: return a new list containing: a symbol named "cons", the
1146 result of calling `quasiquote` on first element of `ast`
1c2eab90 1147 (`ast[0]`), and the result of calling `quasiquote` with the second
0f4ca9d1
JM
1148 through last element of `ast`.
1149
1150
1151Now go to the top level, run the step 7 tests:
1152```
e5737b08 1153make "test^quux^step7"
0f4ca9d1
JM
1154```
1155
1156Quoting is one of the more mundane functions available in mal, but do
1157not let that discourage you. Your mal implementation is almost
1158complete, and quoting sets the stage for the next very exiting step:
1159macros.
1160
1161
45a8b3ca 1162#### Deferrable
0f4ca9d1
JM
1163
1164* The full names for the quoting forms are fairly verbose. Most Lisp
1165 languages have a short-hand syntax and Mal is no exception. These
1166 short-hand syntaxes are known as reader macros because they allow us
1167 to manipulate mal code during the reader phase. Macros that run
1168 during the eval phase are just called "macros" and are described in
1169 the next section. Expand the conditional with reader `read_form`
1170 function to add the following four cases:
1171 * token is "'" (single quote): return a new list that contains the
1172 symbol "quote" and the result of reading the next form
1173 (`read_form`).
8f41f75a 1174 * token is "\`" (back-tick): return a new list that contains the
0f4ca9d1
JM
1175 symbol "quasiquote" and the result of reading the next form
1176 (`read_form`).
1177 * token is "~" (tilde): return a new list that contains the
1178 symbol "unquote" and the result of reading the next form
1179 (`read_form`).
1180 * token is "~@" (tilde + at sign): return a new list that contains
1181 the symbol "splice-unquote" and the result of reading the next
1182 form (`read_form`).
1183
1184* Add support for quoting of vectors. The `is_pair` function should
1185 return true if the argument is a non-empty list or vector. `cons`
1186 should also accept a vector as the second argument. The return value
1187 is a list regardless. `concat` should support concatenation of
1188 lists, vectors, or a mix or both. The result is always a list.
1189
1190
1191<a name="step8"></a>
1192
1193### Step 8: Macros
1194
1195![step8_macros architecture](step8_macros.png)
1196
a85b8454 1197Your mal implementation is now ready for one of the most lispy and
0f4ca9d1
JM
1198exciting of all programming concepts: macros. In the previous step,
1199quoting enabled some simple manipulation data structures and therefore
1200manipulation of mal code (because the `eval` function from step
12016 turns mal data into code). In this step you will be able to mark mal
1202functions as macros which can manipulate mal code before it is
1203evaluated. In other words, macros are user-defined special forms. Or
1204to look at it another way, macros allow mal programs to redefine
1205the mal language itself.
1206
1207Compare the pseudocode for step 7 and step 8 to get a basic idea of
1208the changes that will be made during this step:
1209```
1210diff -urp ../process/step7_quote.txt ../process/step8_macros.txt
1211```
1212
ffd31966
JM
1213* Copy `step7_quote.qx` to `step8_macros.qx`.
1214
1215
0f4ca9d1
JM
1216You might think that the infinite power of macros would require some
1217sort of complex mechanism, but the implementation is actually fairly
1218simple.
1219
1220* Add a new attribute `is_macro` to mal function types. This should
1221 default to false.
1222
1223* Add a new special form `defmacro!`. This is very similar to the
1224 `def!` form, but before the evaluated value (mal function) is set in
1225 the environment, the `is_macro` attribute should be set to true.
1226
1227* Add a `is_macro_call` function: This function takes arguments `ast`
1228 and `env`. It returns true if `ast` is a list that contains a symbol
1229 as the first element and that symbol refers to a function in the
1230 `env` environment and that function has the `is_macro` attribute set
1231 to true. Otherwise, it returns false.
1232
1233* Add a `macroexpand` function: This function takes arguments `ast`
1234 and `env`. It calls `is_macro_call` with `ast` and `env` and loops
1235 while that condition is true. Inside the loop, the first element of
1236 the `ast` list (a symbol), is looked up in the environment to get
1237 the macro function. This macro function is then called/applied with
1238 the rest of the `ast` elements (2nd through the last) as arguments.
1239 The return value of the macro call becomes the new value of `ast`.
1240 When the loop completes because `ast` no longer represents a macro
1241 call, the current value of `ast` is returned.
1242
1243* In the evaluator (`EVAL`) before the special forms switch (apply
1244 section), perform macro expansion by calling the `macroexpand`
1245 function with the current value of `ast` and `env`. Set `ast` to the
1246 result of that call. If the new value of `ast` is no longer a list
9b4cfe03
JM
1247 after macro expansion, then return the result of calling `eval_ast`
1248 on it, otherwise continue with the rest of the apply section
1249 (special forms switch).
0f4ca9d1
JM
1250
1251* Add a new special form condition for `macroexpand`. Call the
1252 `macroexpand` function using the first `ast` argument (second list
1253 element) and `env`. Return the result. This special form allows
1254 a mal program to do explicit macro expansion without applying the
1255 result (which can be useful for debugging macro expansion).
1256
1257Now go to the top level, run the step 8 tests:
1258```
e5737b08 1259make "test^quux^step8"
0f4ca9d1
JM
1260```
1261
8a98ef9a
JM
1262There is a reasonably good chance that the macro tests will not pass
1263the first time. Although the implementation of macros is fairly
1264simple, debugging runtime bugs with macros can be fairly tricky. If
1265you do run into subtle problems that are difficult to solve, let me
1266recommend a couple of approaches:
1267
1268* Use the macroexpand special form to eliminate one of the layers of
1269 indirection (to expand but skip evaluate). This will often reveal
1270 the source of the issue.
1271* Add a debug print statement to the top of your main `eval` function
1272 (inside the TCO loop) to print the current value of `ast` (hint use
1273 `pr_str` to get easier to debug output). Pull up the step8
1274 implementation from another language and uncomment its `eval`
1275 function (yes, I give you permission to violate the rule this once).
1276 Run the two side-by-side. The first difference is likely to point to
1277 the bug.
1278
1279Congratulations! You now have a Lisp interpreter with a super power
1280that most non-Lisp languages can only dream of (I have it on good
1281authority that languages dream when you are not using them). If you
1282are not already familiar with Lisp macros, I suggest the following
a85b8454 1283exercise: write a recursive macro that handles postfixed mal code
8a98ef9a
JM
1284(with the function as the last parameter instead of the first). Or
1285not. I have not actually done so myself, but I have heard it is an
a85b8454 1286interesting exercise.
8a98ef9a
JM
1287
1288In the next step you will add try/catch style exception handling to
1289your implementation in addition to some new core functions. After
1290step9 you will be very close to having a fully self-hosting mal
1291implementation. Let us continue!
1292
0f4ca9d1 1293
dc791440 1294#### Deferrable
0f4ca9d1
JM
1295
1296* Add the following new core functions which are frequently used in
1297 macro functions:
1298 * `nth`: this function takes a list (or vector) and a number (index)
1299 as arguments, returns the element of the list at the given index.
1300 If the index is out of range, this function raises an exception.
1301 * `first`: this function takes a list (or vector) as its argument
1302 and return the first element. If the list (or vector) is empty or
1303 is `nil` then `nil` is returned.
1304 * `rest`: this function takes a list (or vector) as its argument and
1305 returns a new list containing all the elements except the first.
1306
1307* In the main program, use the `rep` function to define two new
1308 control structures macros. Here are the string arguments for `rep`
1309 to define these macros:
1310 * `cond`: "(defmacro! cond (fn* (& xs) (if (> (count xs) 0) (list 'if (first xs) (if (> (count xs) 1) (nth xs 1) (throw \"odd number of forms to cond\")) (cons 'cond (rest (rest xs)))))))"
1311 * `or`: "(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) `(let* (or_FIXME ~(first xs)) (if or_FIXME or_FIXME (or ~@(rest xs))))))))"
1312
1313
ffd31966
JM
1314<a name="step9"></a>
1315
1316### Step 9: Try
1317
1318![step9_try architecture](step9_try.png)
1319
2345a3da
JM
1320In this step you will implement the final mal special form for
1321error/exception handling: `try*/catch*`. You will also add several core
10a76646
AS
1322functions to your implementation. In particular, you will enhance the
1323functional programming pedigree of your implementation by adding the
2345a3da
JM
1324`apply` and `map` core functions.
1325
ffd31966 1326Compare the pseudocode for step 8 and step 9 to get a basic idea of
dbac60df 1327the changes that will be made during this step:
ffd31966
JM
1328```
1329diff -urp ../process/step8_macros.txt ../process/step9_try.txt
1330```
1331
1332* Copy `step8_macros.qx` to `step9_try.qx`.
1333
2345a3da
JM
1334* Add the `try*/catch*` special form to the EVAL function. The
1335 try catch form looks like this: `(try* A (catch* B C))`. The form
1336 `A` is evaluated, if it throws an exception, then form `C` is
1c2eab90 1337 evaluated with a new environment that binds the symbol `B` to the
2345a3da
JM
1338 value of the exception that was thrown.
1339 * If your target language has built-in try/catch style exception
1340 handling then you are already 90% of the way done. Add a
1c2eab90 1341 (native language) try/catch block that evaluates `A` within
2345a3da
JM
1342 the try block and catches all exceptions. If an exception is
1343 caught, then translate it to a mal type/value. For native
1344 exceptions this is either the message string or a mal hash-map
1345 that contains the message string and other attributes of the
1c2eab90 1346 exception. When a regular mal type/value is used as an
2345a3da
JM
1347 exception, you will probably need to store it within a native
1348 exception type in order to be able to convey/transport it using
1349 the native try/catch mechanism. Then you will extract the mal
1350 type/value from the native exception. Create a new mal environment
1c2eab90 1351 that binds `B` to the value of the exception. Finally, evaluate `C`
2345a3da
JM
1352 using that new environment.
1353 * If your target language does not have built-in try/catch style
1354 exception handling then you have some extra work to do. One of the
1355 most straightforward approaches is to create a a global error
1356 variable that stores the thrown mal type/value. The complication
1357 is that there are a bunch of places where you must check to see if
1358 the global error state is set and return without proceeding. The
1359 rule of thumb is that this check should happen at the top of your
1360 EVAL function and also right after any call to EVAL (and after any
1361 function call that might happen to call EVAL further down the
1362 chain). Yes, it is ugly, but you were warned in the section on
1363 picking a language.
1364
cfdf00cc 1365* Add the `throw` core function.
2345a3da
JM
1366 * If your language supports try/catch style exception handling, then
1367 this function takes a mal type/value and throws/raises it as an
1368 exception. In order to do this, you may need to create a custom
1369 exception object that wraps a mal value/type.
1370 * If your language does not support try/catch style exception
1371 handling, then set the global error state to the mal type/value.
1372
1373* Add the `apply` and `map` core functions. In step 5, if you did not
1374 add the original function (`fn`) to the structure returned from
bd62ff74
JM
1375 `fn*`, the you will need to do so now.
1376 * `apply`: takes at least two arguments. The first argument is
1377 a function and the last argument is list (or vector). The
a85b8454 1378 arguments between the function and the last argument (if there are
bd62ff74
JM
1379 any) are concatenated with the final argument to create the
1380 arguments that are used to call the function. The apply
1381 function allows a function to be called with arguments that are
1382 contained in a list (or vector). In other words, `(apply F A B [C
1383 D])` is equivalent to `(F A B C D)`.
1384 * `map`: takes a function and a list (or vector) and evaluates the
1385 function against every element of the list (or vector) one at
1386 a time and returns the results as a list.
2345a3da
JM
1387
1388* Add some type predicates core functions. In Lisp, predicates are
1389 functions that return true/false (or true value/nil) and typically
bd62ff74
JM
1390 end in "?" or "p".
1391 * `nil?`: takes a single argument and returns true (mal true value)
1392 if the argument is nil (mal nil value).
1393 * `true?`: takes a single argument and returns true (mal true value)
1394 if the argument is a true value (mal true value).
1395 * `false?`: takes a single argument and returns true (mal true
1396 value) if the argument is a false value (mal false value).
1397 * `symbol?`: takes a single argument and returns true (mal true
1398 value) if the argument is a symbol (mal symbol value).
1399
1400Now go to the top level, run the step 9 tests:
1401```
e5737b08 1402make "test^quux^step9"
bd62ff74
JM
1403```
1404
1405Your mal implementation is now essentially a fully featured Lisp
1406interpreter. But if you stop now you will miss one of the most
1407satisfying and enlightening aspects of creating a mal implementation:
1408self-hosting.
2345a3da 1409
dc791440 1410#### Deferrable
2345a3da 1411
e37d9b49
JM
1412* Add the following new core functions:
1413 * `symbol`: takes a string and returns a new symbol with the string
1414 as its name.
1415 * `keyword`: takes a string and returns a keyword with the same name
1416 (usually just be prepending the special keyword
1417 unicode symbol). This function should also detect if the argument
1418 is already a keyword and just return it.
1419 * `keyword?`: takes a single argument and returns true (mal true
1420 value) if the argument is a keyword, otherwise returns false (mal
1421 false value).
1422 * `vector`: takes a variable number of arguments and returns
1423 a vector containing those arguments.
1424 * `vector?`: takes a single argument and returns true (mal true
1425 value) if the argument is a vector, otherwise returns false (mal
1426 false value).
1427 * `hash-map`: takes a variable but even number of arguments and
1428 returns a new mal hash-map value with keys from the odd arguments
1429 and values from the even arguments respectively. This is basically
1430 the functional form of the `{}` reader literal syntax.
1431 * `map?`: takes a single argument and returns true (mal true
1432 value) if the argument is a hash-map, otherwise returns false (mal
1433 false value).
1434 * `assoc`: takes a hash-map as the first argument and the remaining
1435 arguments are odd/even key/value pairs to "associate" (merge) into
1436 the hash-map. Note that the original hash-map is unchanged
1437 (remember, mal values are immutable), and a new hash-map
1438 containing the old hash-maps key/values plus the merged key/value
1439 arguments is returned.
1440 * `dissoc`: takes a hash-map and a list of keys to remove from the
1441 hash-map. Again, note that the original hash-map is unchanged and
1442 a new hash-map with the keys removed is returned. Key arguments
1443 that do not exist in the hash-map are ignored.
1444 * `get`: takes a hash-map and a key and returns the value of looking
1445 up that key in the hash-map. If the key is not found in the
1446 hash-map then nil is returned.
1447 * `contains?`: takes a hash-map and a key and returns true (mal true
1448 value) if the key exists in the hash-map and false (mal false
1449 value) otherwise.
1450 * `keys`: takes a hash-map and returns a list (mal list value) of
1451 all the keys in the hash-map.
1452 * `vals`: takes a hash-map and returns a list (mal list value) of
1453 all the values in the hash-map.
1454 * `sequential?`: takes a single arguments and returns true (mal true
1455 value) if it is a list or a vector, otherwise returns false (mal
1456 false value).
ffd31966
JM
1457
1458
8569b2af 1459<a name="stepA"></a>
ffd31966 1460
396d869e 1461### Step A: Metadata, Self-hosting and Interop
ffd31966 1462
90f618cb 1463![stepA_mal architecture](stepA_mal.png)
ffd31966 1464
bd62ff74
JM
1465You have reached the final step of your mal implementation. This step
1466is kind of a catchall for things that did not fit into other steps.
1467But most importantly, the changes you make in this step will unlock
1468the magical power known as "self-hosting". You might have noticed
1469that one of the languages that mal is implemented in is "mal". Any mal
1470implementation that is complete enough can run the mal implementation
1471of mal. You might need to pull out your hammock and ponder this for
1472a while if you have never built a compiler or interpreter before. Look
1473at the step source files for the mal implementation of mal (it is not
1474cheating now that you have reached step A).
1475
1476If you deferred the implementation of keywords, vectors and hash-maps,
1477now is the time to go back and implement them if you want your
1478implementation to self-host.
1479
ffd31966
JM
1480Compare the pseudocode for step 9 and step A to get a basic idea of
1481the changes that will be made during this step:
1482```
dbac60df 1483diff -urp ../process/step9_try.txt ../process/stepA_mal.txt
ffd31966
JM
1484```
1485
90f618cb 1486* Copy `step9_try.qx` to `stepA_mal.qx`.
ffd31966 1487
396d869e
JM
1488* Add the `readline` core function. This functions takes a
1489 string that is used to prompt the user for input. The line of text
1490 entered by the user is returned as a string. If the user sends an
1491 end-of-file (usually Ctrl-D), then nil is returned.
1492
bd62ff74
JM
1493* Add meta-data support to mal functions. TODO. Should be separate
1494 from the function macro flag.
1495
33e37b68
JM
1496* Add a new "\*host-language\*" (symbol) entry to your REPL
1497 environment. The value of this entry should be a mal string
1498 containing thename of the current implementation.
1499
1500* When the REPL starts up (as opposed to when it is called with
1501 a script and/or arguments), call the `rep` function with this string
1502 to print a startup header:
1503 "(println (str \"Mal [\" *host-language* \"]\"))".
1504
bd62ff74 1505
8f41f75a 1506Now go to the top level, run the step A tests:
bd62ff74 1507```
e5737b08 1508make "test^quux^stepA"
bd62ff74
JM
1509```
1510
1511Once you have passed all the non-optional step A tests, it is time to
1512try self-hosting. Run your step A implementation as normal, but use
1513the file argument mode you added in step 6 to run a each of the step
1514from the mal implementation:
1515```
1516./stepA_mal.qx ../mal/step1_read_print.mal
1517./stepA_mal.qx ../mal/step2_eval.mal
1518...
1519./stepA_mal.qx ../mal/step9_try.mal
1520./stepA_mal.qx ../mal/stepA_mal.mal
1521```
1522
10a76646 1523There is a very good chance that you will encounter an error at some
bd62ff74
JM
1524point while trying to run the mal in mal implementation steps above.
1525Debugging failures that happen while self-hosting is MUCH more
1526difficult and mind bending. One of the best approaches I have
cfdf00cc 1527personally found is to add prn statements to the mal implementation
bd62ff74
JM
1528step (not your own implementation of mal) that is causing problems.
1529
1530Another approach I have frequently used is to pull out the code from
1531the mal implementation that is causing the problem and simplify it
1532step by step until you have a simple piece of mal code that still
1533reproduces the problem. Once the reproducer is simple enough you will
1534probably know where in your own implementation that problem is likely
1535to be. Please add your simple reproducer as a test case so that future
1536implementers will fix similar issues in their code before they get to
1537self-hosting when it is much more difficult to track down and fix.
1538
1539Once you can manually run all the self-hosted steps, it is time to run
1540all the tests in self-hosted mode:
1541```
e5737b08 1542make MAL_IMPL=quux "test^mal"
bd62ff74
JM
1543```
1544
1545When you run into problems (which you almost certainly will), use the
1546same process described above to debug them.
1547
1548Congratulations!!! When all the tests pass, you should pause for
1549a moment and consider what you have accomplished. You have implemented
1550a Lisp interpreter that is powerful and complete enough to run a large
1551mal program which is itself an implementation of the mal language. You
1552might even be asking if you can continue the "inception" by using your
1553implementation to run a mal implementation which itself runs the mal
1554implementation.
1555
4881701a 1556
dc791440 1557#### Optional: gensym
4881701a
DM
1558
1559The `or` macro we introduced at step 8 has a bug. It defines a
1560variable called `or_FIXME`, which "shadows" such a binding from the
1561user's code (which uses the macro). If a user has a variable called
1562`or_FIXME`, it cannot be used as an `or` macro argument. In order to
1563fix that, we'll introduce `gensym`: a function which returns a symbol
1564which was never used before anywhere in the program. This is also an
1565example for the use of mal atoms to keep state (the state here being
1566the number of symbols produced by `gensym` so far).
1567
1568Previously you used `rep` to define the `or` macro. Remove that
1569definition and use `rep` to define the new counter, `gensym` function
1570and the clean `or` macro. Here are the string arguments you need to
1571pass to `rep`:
1572```
1573"(def! *gensym-counter* (atom 0))"
1574
1575"(def! gensym (fn* [] (symbol (str \"G__\" (swap! *gensym-counter* (fn* [x] (+ 1 x)))))))"
1576
1577"(defmacro! or (fn* (& xs) (if (empty? xs) nil (if (= 1 (count xs)) (first xs) (let* (condvar (gensym)) `(let* (~condvar ~(first xs)) (if ~condvar ~condvar (or ~@(rest xs)))))))))"
1578```
1579
1580For extra information read [Peter Seibel's thorough discussion about
1581`gensym` and leaking macros in Common Lisp](http://www.gigamonkeys.com/book/macros-defining-your-own.html#plugging-the-leaks).
1582
1583
dc791440 1584#### Optional additions
bd62ff74
JM
1585
1586* Add metadata support to composite data types, symbols and native
1587 functions. TODO
50756494
DM
1588* Add the following new core functions:
1589 * `time-ms`: takes no arguments and returns the number of
a85b8454 1590 milliseconds since epoch (00:00:00 UTC January 1, 1970), or, if
50756494
DM
1591 not possible, since another point in time (`time-ms` is usually
1592 used relatively to measure time durations). After `time-ms` is
1593 implemented, you can run the mal implementation performance
1594 benchmarks by running `make perf^quux`.
1595 * `conj`: takes a collection and one or more elements as arguments
1596 and returns a new collection which includes the original
1597 collection and the new elements. If the collection is a list, a
1598 new list is returned with the elements inserted at the start of
1599 the given list in opposite order; if the collection is a vector, a
1600 new vector is returned with the elements added to the end of the
1601 given vector.
396d869e
JM
1602 * `string?`: returns true if the parameter is a string.
1603 * `seq`: takes a list, vector, string, or nil. If an empty list,
1604 empty vector, or empty string ("") is passed in then nil is
1605 returned. Otherwise, a list is returned unchanged, a vector is
1606 converted into a list, and a string is converted to a list that
1607 containing the original string split into single character
1608 strings.
ffd31966
JM
1609
1610
0f4ca9d1
JM
1611## TODO:
1612
1613* simplify: "X argument (list element Y)" -> ast[Y]
0f4ca9d1
JM
1614* list of types with metadata: list, vector, hash-map, mal functions
1615* more clarity about when to peek and poke in read_list and read_form
1616* tokenizer: use first group rather than whole match (to eliminate
1617 whitespace/commas)