CodeScientist: Automated scientific discovery system for code-based experiments

a year ago

CodeScientist is an end-to-end semi-automated scientific discovery system that designs, iterates, and analyzes scientific experiments expressed as Python code.
It generates novel ideas by using genetic mutations (LLM-as-a-mutator paradigm) to combine scientific articles and code examples.
The system includes an Experiment Builder that automatically creates, runs, and debugs experiment code in a container, then writes a report on the results.
CodeScientist can run in two modes: Human-in-the-loop (primary mode) and Fully-automatic (less efficient).
The repository includes open-source software, installation instructions, reports, raw data (experiment code, logs, ideas, etc.), and example papers.
Key features include ideation from papers, experiment creation (manual/automatic), batch autonomous experiments, meta-analysis, and integration with external systems.
Installation requires Ubuntu Linux (or MacOS), Python 3.12, a Latex distribution, and a Modal.com account for cloud container execution.
Users can add papers and codeblocks for specialized domains to extend CodeScientist's capabilities.
The system includes cost estimation tools to manage LLM API usage, with hard limits to prevent runaway costs.
Example experiments demonstrate CodeScientist's ability to test hypotheses, such as LLM confidence vs. accuracy, state complexity effects, and knowledge graph agents.

Hasty Briefsbeta