Use thread-local allocation (significant perf. improvement!); added the `boehm-gc...