Llama2.c64: a port of llama2.c to the Commodore C64

a year ago

Llama2.c64 is a port of llama2.c to the Commodore C64, requiring at least 2MB REU.
It runs the 260K tinystories model, simulating a 3-year-old's storytelling ability.
Setup involves enabling REU, setting size to 2MB, and loading weights.reu.
Commands include 'make build', 'make test', and 'make clean' for building and testing.
Exomizer is optional for compressing the program for real hardware use.
Advantages include low power consumption, on-premise inference, and data safety.
Limitations include slow performance and inability to handle models larger than 8MB.
Model preprocessing is done with generate-model-files.py, producing tokenizer.bin, config.bin, and weights.reu.
The model runs deterministically with temperature=0.0 and supports top-p sampling.
Output tokens appear approximately every 8 minutes, with the first token being a start marker.
Optimization is limited, with most time spent in matrix multiplication functions.
The program is not suitable for chat but can generate short stories.

Hasty Briefsbeta