ThalamusDB: Query text, tables, images, and audio

4 days ago

Copy Link

ThalamusDB is an approximate processing engine supporting SQL queries with semantic operators on multimodal data.
Install ThalamusDB using pip: `pip install thalamusdb`.
Set environment variables for API keys, e.g., `export OPENAI_API_KEY=[Your Key]`.
Run ThalamusDB console with a DuckDB database file and model configuration.
Example database `cars.db` contains a table with text descriptions and image paths.
Supports semantic queries like `nlfilter(pic, 'the car in the picture is red')`.
Works with text, images, and audio files stored as paths in text columns.
Supports two semantic filter operators: `NLfilter` and `NLjoin`.
Model configuration file specifies models for different data types and operators.
Designed for approximate processing, displaying bounds for aggregation queries and intersection rows for retrieval queries.
Error bounds help track progress toward exact results.
Configurable stopping criteria include max time, LLM calls, tokens, and error threshold.
Documentation and example available on GitHub and Google Colab.

Hasty Briefsbeta