Oddly enough, there are a series of three successive highly-optimizing compilers for, of all things, Scheme. Starting with the
RABBIT compiler in the 1978, followed by the
STALIN compiler in the 1990s and early 2000s, and most recently the
Stalingrad compiler, these are all whole-program optimizers which produced notably performant code for a language with a reputation for poor performance.
This is mostly an academic exercise, however. Scheme has been the target mainly because the language is so simple that the compiler has a lot of room for transformations of the code. The resulting code often successfully eliminates most of the garbage collection typical of Scheme programs, as well as a lot of code merging and loop unrolling in addition to basics such as tail call optimization.
It is relevant to me, however, at least in the long term, since I have been working on a compiler for a Lisp-like language I call
`L (Prime-L), and have some ambitions towards applying such optimization techniques to it eventually (and later, to it's successor language, Thelema, once I've decided which aspects of `L worked and which didn't). This initial version of the compiler is in C++ (for various reasons, I didn't want to rely on the typical Lisp-ish metacircular evaluator), but I mean to later have it self-hosting.
One of the key things I mean to test is whether I could have an s-expression language which does not have garbage collection in the compiler, but then use library macros to implement GC in the language itself in a transparent manner. I am hoping that this would make the language suitable for system programming while still gaining the usual advantages of a Lisp family language.