Lil' Fun Langs' Guts
7 hours ago
- #functional-programming
- #language-design
- #compilers
- Lil' fun langs differ in strict vs. lazy evaluation, curried vs. bland functions, bootstrapped vs. hosted implementation, interpreted vs. compiled execution, nominal vs. structural types, and error message quality.
- Standard compilation phases include lexing, parsing, desugaring, type inference, pattern match compilation, normalization, optimization, closure conversion, code generation, register allocation, and runtime system setup.
- Strict evaluation evaluates arguments before function application, while lazy evaluation evaluates only when needed, caching results.
- Curried functions require arity analysis for efficient native codegen, while bland functions simplify code generation by using tuples or multiple parameters.
- Bootstrapped compilers are written in their own language, requiring a minimal seed runtime, while hosted compilers rely on an existing language's ecosystem.
- Interpreters execute programs directly via AST or bytecode, while compilers translate programs to another language like x86, C, or JS.
- Nominal types distinguish types by name, while structural types equate types with the same shape.
- Pretty error messages require source spans preserved through all compiler phases and contextual formatting.
- Lexing and parsing convert source code to tokens and then to AST, with approaches like recursive descent, LALR, or parser combinators.
- Type inference in ML-family languages uses Hindley-Milner, with components like unification, generalization, and instantiation.
- Pattern match compilation optimizes decision trees for efficient execution, with approaches like Maranget's algorithm.
- Normalization transforms programs into forms like A-normal form (ANF) or continuation-passing style (CPS) to make evaluation order explicit.
- Optimization passes include beta reduction, inlining, constant folding, and dead code elimination to improve performance.
- Closure conversion transforms functions with free variables into closed functions with explicit environments.
- Code generation targets include native assembly, C, LLVM, bytecode, or combinatory logic, each with trade-offs in performance and complexity.
- Register allocation maps program variables to CPU registers or spills to memory, with approaches like graph coloring or linear scan.
- Runtime systems manage memory via garbage collection (e.g., Cheney copying, mark-and-sweep) or compile-time strategies like ownership types.